I have been building a desktop app (electron-based) which interacts with Anthropic’s AgentSDK and the local file system.
It’s 100% spec driven and Claude Code has written every line. I do large features instead of small ones (spec in issue around 300 lines of markdown).
I have had it generate playwright tests from the start. It was doing okay but one thing made it do amazing. I created a spec driven pull request to use data-testid attributes for selectors.
Every new feature adds tests, and verifies it hasn’t broken existing features.
I don’t even bother with unit tests. It’s working amazing.