7 pointsby 10keane6 hours ago2 comments
  • boesboes6 hours ago
    > because it LOOKS like good engineering

    That is the whole problem imho. I've found that I can use LLMs to do programming only if I fully understand the problem and solution. Because if I don't, it will just pretend that I'm right and happily spend hours trying to implement a broken idea.

    The problem is that it's very hard to known whether my understanding of something is sufficient to have claude propose a solution and for me to know if it is going to work. If my understanding of the problem is incorrect or incomplete, the plan will look fine too me, but it will be wrong.

    If I start working on something from poor understanding, I will notice and improve my understanding. A LLM will just deceive and try to do the impossible anyway.

    Also, it overcooks everything, atleast 50-60% of the code it generates are pointlessly verbose abstractions. agian: imho, ymmv, ianal, not financial advice ;)

    • 10keane6 hours ago
      exactly. vibe coding only works when you fully understand the problem and know precisely how to solve it. ai just do the dirty implementation work for you.

      that is another reason in why i separate product/architecture design and implementation into two agents with isolated context in my workflow. because i can always iterate with the product agent to refine my understanding and THEN ask the coding agent to implement it. by that time i already have the ability to make proper judgement and evaluate coding agent's output

  • maroondlabs6 hours ago
    One failure mode worth adding to this list: the agent edits the obvious file and stops. You fix a function — the test doesn't follow. You update the source — the config doesn't change. The sibling stays stale.

    We ran 30+ benchmark tasks across 17 repos (Go, Rust, Python, JS) and this pattern showed up across every model we tested. The agent knows what to do in the file it opened. It just doesn't know what else needs to open.

    It's a different class of failure from the ones you describe — not bad reasoning, not wrong architecture, just incomplete blast radius. The agent was right, it just stopped too early.

    We built sourcebook (sourcebook.run) to catch this — checks diffs against git co-change history and import graph structure to flag what's missing. No LLM, runs in under a second.