Fair question. I haven’t done a systematic benchmark yet, so I don’t have hard numbers to point to.
Honestly I’ve mostly been iterating from actual use. The main test has been whether it helps me keep the good parts of brainstorming with the agent, recover context across longer multi PR or multi session work, and reduce friction overall.
So right now the evidence is mostly qualitative and based on my own workflow, not a formal evaluation.