I am doing similar experiments for in-browser user testing, where the user is basically Claude. Incredible results with this simple pipeline:
1. Claude tests a feature
2. Notes down all friction and pain points
3. Convert those as a prioritized todo list
4. Use Claude Code or similar to action the todo list
It has some friction in-between steps 3 and 4, but nothing that can't be solved without running `claude --chrome` via CLI instead of the Chrome extension.