Tests fail, there's a huge wall of output, and the agent spends most of its time figuring out what actually went wrong.
A lot of the time it's not different failures, it's the same issue repeated across tests.
So I hacked together a small CLI to group failures before the agent sees them.
So instead of dumping everything into the agent, it gets something like:
125 errors from a missing env var 3 failures from snapshot drift
Tried it on a backend with ~640 tests and it cut tokens and runtime quite a bit (around 60%), but the bigger difference is the agent just stops digging around as much.
It handles a lot of the common cases locally and only falls back when it can’t explain things cleanly.
Still rough in places, but curious if others working with agents have run into this.