More tokens, less cost: why optimizing for token count is wrong

1 pointby nicola_alessi5 hours ago3 comments

alexbuiko5 hours ago
This is a brilliant breakdown of the 'Token Mix' paradox. It aligns perfectly with what we’ve been seeing while developing SDAG.
When you optimize for a structured context payload (like your dependency graph), you aren't just hitting the Anthropic pricing cache—you are literally reducing the routing entropy at the inference level. High-noise inputs force the model into 'exploratory' output paths, which isn't just expensive in dollars, but also in hardware stress.
We found that 'verbose orientation narration' (the thinking-out-loud part) correlates with higher entropy spikes in memory access. By tightening the input signal-to-noise ratio, you're essentially stabilizing the model's internal routing. Have you noticed any changes in latency variance (jitter) between the pre-indexed and ad-hoc runs? In our tests, lower entropy usually leads to much more predictable TTFT (Time To First Token).
- nicola_alessi5 hours ago
  Interesting framing — hadn't thought about it from the inference routing angle but it maps well to what the data shows. On latency variance: yes, significantly. Cost standard deviation across runs dropped 6-24x depending on task type. The most extreme case was a refactoring task: baseline sigma $0.312 vs $0.013 with pre-indexed context. Duration variance also dropped in 6 out of 7 tasks. I didn't measure TTFT specifically but the overall duration went from 170s → 132s with much tighter clustering around the mean. The stabilization effect is probably the most underrated finding. Everyone focuses on the average cost reduction, but the predictability improvement matters more for production workloads — you can actually forecast spend instead of hoping the agent doesn't go on an exploration tangent. What's SDAG? Curious about your setup.
gnabgib5 hours ago
You're over doing the self-promotion (this is the 7th time you've submitted vexp), share something with us you're curious about that you didn't build.
> Please don't use HN primarily for promotion. It's ok to post your own stuff part of the time, but the primary use of the site should be for curiosity.
https://news.ycombinator.com/newsguidelines.html
- nicola_alessi5 hours ago
  Fair point, appreciate the callout. I'll dial it back.
  - jacquesman hour ago
    No, don't dial it back. Just stop. The only way this will end otherwise is either with an account ban, a domain ban or both.
verdverm3 hours ago
tl;dr AGENTS.md and the Anthropic post about putting MCPs behind search are a winning idea right now