The core argument: agentic workflows made sense when max output was 4k tokens. Frontier models now support 64k-128k output tokens, but most tooling still optimizes for short responses. The structural inefficiency compounds quadratically with project size.
The paper includes: - Formal token economics model with sensitivity analysis (13-17x advantage at 10k lines, 40-45x at 30k lines) - Three field deployments with real cost figures ($0.58 for 14,431 lines) - Honest failure cases including an SSE parsing incident that required full re-stream - A reproducibility protocol for independent validation - Capability tier framework to make claims model-version-agnostic
I'm a business analyst in South Africa. No lab affiliation, no team. Happy to answer questions about methodology or the streaming infrastructure.