1 pointby abbyedd6 hours ago1 comment
  • abbyedd6 hours ago
    Frontier models like Opus 4.6 have practically solved the "lost in the middle" problem. You can throw a 2M-token monorepo into the context window, and the model won't hallucinate.

    But doing that creates a massive new problem: Agent Velocity.

    If your autonomous coding agent has to ingest 1M tokens of raw code for every step of a 40-step debugging loop, the latency (Time-to-First-Token) and costs skyrocket. Prompt caching helps, but not when the agent is constantly jumping between disparate parts of the codebase.

    Furthermore, human Principal Engineers don't solve bugs by perfectly memorizing 2,000 files line-by-line. They use Cognitive Hierarchy:

    Architecture: They hold a high-level system model in their head (~200 tokens). Interfaces: They look at the signatures of the modules they are using (~400 tokens). Implementation: They only look at the raw code of the exact 50 lines they are altering. We built [Entroly] to give this exact hierarchical structure to AI agents, bypassing the need for massive context dumps

    • franktankbank6 hours ago
      Wheres the demo? Theres just some claims of reducing tokens, but no head to head with real prompts for producing useful large projects. I guess in theory it kind of makes sense for an idealized language. Seems a lot of leaps elsewise.