We went from simple chatbots to thinking models which massively exploded token utilization.
We then went from simple thinking models to tool calls and agents. Agents, and particularly long horizon agents, burn truly insane numbers of tokens blowing thinking models well out of the water.
People are trying to do agentic swarms as the next step but I don't think those make sense as of right now. Particularly they are just too insanely expensive and not that useful.
Plus right now the models just aren't good at it. It's like early agents when they first started making tool calls.
Agents are really quite bad at using subagents. They don't really internalize how to deploy them and they also don't utilize them in the ways that make sense (produce planning documents, have verifiable artifacts, break down tasks in ways that minimize risk, recognize model limitations in instruction following, iterate on results, etc).
Your last paragraph is also striking in that it exemplifies how far away from general intelligence they still are.
Most of everything tends to suck. Most projects go nowhere, most companies fail, most scientific papers are garbage.
> how far away from general intelligence they still are
Economically the real question is to what extent can these systems replace or augment human labour. And I think right now the extent is pretty shocking if not currently very well integrated.
Scientifically the fact they are bad at using subagents is sort of expected. How to use agents effectively is still a bit of an open question. A human from mid 2025 would be bad at it. Why should a model trained on data from 2025 be good at it?
If these things were to be generally intelligent they need feedback and retraining. Which persumable the Labs will do once these sorts of questions start having good answers and we can create good benchmarks and measures for meta orchestration.
Umm, whats your point? We arent spending 1.4t on other shitty things that are tipping to fail