That is the whole problem imho. I've found that I can use LLMs to do programming only if I fully understand the problem and solution. Because if I don't, it will just pretend that I'm right and happily spend hours trying to implement a broken idea.
The problem is that it's very hard to known whether my understanding of something is sufficient to have claude propose a solution and for me to know if it is going to work. If my understanding of the problem is incorrect or incomplete, the plan will look fine too me, but it will be wrong.
If I start working on something from poor understanding, I will notice and improve my understanding. A LLM will just deceive and try to do the impossible anyway.
Also, it overcooks everything, atleast 50-60% of the code it generates are pointlessly verbose abstractions. agian: imho, ymmv, ianal, not financial advice ;)
that is another reason in why i separate product/architecture design and implementation into two agents with isolated context in my workflow. because i can always iterate with the product agent to refine my understanding and THEN ask the coding agent to implement it. by that time i already have the ability to make proper judgement and evaluate coding agent's output
We ran 30+ benchmark tasks across 17 repos (Go, Rust, Python, JS) and this pattern showed up across every model we tested. The agent knows what to do in the file it opened. It just doesn't know what else needs to open.
It's a different class of failure from the ones you describe — not bad reasoning, not wrong architecture, just incomplete blast radius. The agent was right, it just stopped too early.
We built sourcebook (sourcebook.run) to catch this — checks diffs against git co-change history and import graph structure to flag what's missing. No LLM, runs in under a second.
>Don't post generated comments or AI-edited comments. HN is for conversation between humans.