Besides model capabilities, one of the most important aspects of AI-assisted development right now, is context management
Cursor et al try to automate that for the user, and it works up to a point. But at a certain level of complexity, the user needs to get actively involved in managing the context
It also seems like some people who say they are having very good results with agentic coding, take a lot of care in managing their cursor rules or claude.md files
just wanted then to shameless plug this lib I've built recently for this very topic, because it's been much easier to sell that into our clients than evals really, because it's closer to e2e tests: https://github.com/langwatch/scenario
instead of 100 examples, it's easier for people to think on just the anecdotal example where the problem happens and let AI expand it, or replicate a situation from prod and describe the criteria in simple terms or code