But doing that creates a massive new problem: Agent Velocity.
If your autonomous coding agent has to ingest 1M tokens of raw code for every step of a 40-step debugging loop, the latency (Time-to-First-Token) and costs skyrocket. Prompt caching helps, but not when the agent is constantly jumping between disparate parts of the codebase.
Furthermore, human Principal Engineers don't solve bugs by perfectly memorizing 2,000 files line-by-line. They use Cognitive Hierarchy:
Architecture: They hold a high-level system model in their head (~200 tokens). Interfaces: They look at the signatures of the modules they are using (~400 tokens). Implementation: They only look at the raw code of the exact 50 lines they are altering. We built [Entroly] to give this exact hierarchical structure to AI agents, bypassing the need for massive context dumps