I know it's sounds stupid, but what if
True visionaries think outside the box, but most tech executives are forcing their employees into black boxes, out of fear of not doing exactly what their competitors are doing.
We have lemmings for leaders, and that means that—much like the LLMs that are being shoehorned into everything—there isn’t room for original thinking. Everyone’s strategy looks exactly the same.
If you're a CxO who's looking out for themself, herd-like behavior is the safest option, due to the (near universal) incentives structures.
I'd be curious to hear from people well versed in group psychology/dynamics and/or just a lot of leadership/people experience: what leads people to this type of thinking once they get in a group setting? It just... seems endemic at this point.
Obviously nobody here is going to know what I do or don't know, but I'm just increasingly curious what I am not understanding about this type of thing. It seems so obvious, yet that makes me ever more suspect that I'm oversimplifying it, or just totally ignorant about the problem in general.
Roll it all together and saying "just use it dammit" has some obvious advantages:
1. It's clear.
2. It's simple.
3. It eliminates all excuses employees might come up with for not using it.
The people at the top of these companies aren't stupid. They might have miscalculated how many tokens people can actually use, but that's very hard to calculate because usage is opaque and tools/processes change on a nearly weekly basis. They will eventually build out processes, tools, social conventions and performance metrics that take into account efficiency of token usage. But this is hard! Most managers aren't really assessed on the precise productivity of their teams, for instance, because productivity is often poorly defined.
Lets talk my bonus, I will open the bidding at $1 per token.
"It is difficult to get a man to understand something, when his salary depends on his not understanding it." -Upton Sinclair
That VC funded gravy train is likely coming to an end. But fortunately there are also reasonably efficient models now so that the tokenmaxxers can still make the (much cheaper) tokens go brrrr.
Trying to operate as a rational, thinking person in a lot of environments right now feels impossible. Rational thought is being treated like AI skepticism.
When will Uber (or your favourite company) be 'done'? They've been writing software for 16 years.
They match drivers to passengers. More software isn't going to increase the chance that I seek them out instead of taking a bus or train.
Will their software be finished in 20 years? 80?
Airports: different regulations, different rules for pickup/dropoff. Also scammers who pretend to be in a car, walk with their phones around pick-up ares in airport and do bait-and-switch (saw that in Istanbul SAW and in Dubai Al Maktoum)
I took a ride from SEATAC to my hotel in downtown Seattle and besides the ride itself, there were 5 other items on the bill, 4 of which are specific to the place I used Uber.
Then I had the return trip from my hotel to SEATAC, on this one I got EIGHT items on the bill, on top of the ride fare. Some specific to Seattle itself, some specific to the road that the Uber took (a tunnel fee - which is different based on the direction you take it in), etc.
So the real question is what is NOT different between two locations. Less than 15% of the bill.
I also took Uber in India, where you have to share a one-time password with the driver for example, which I've never seen in any other country.
In some other countries the Uber app exists but Uber drivers are actually taxis, so you're actually ordering a taxi via the app.
Essentially every single airport in the world is custom UI and custom walking path guides and instructions, and rules for where pickups/dropoffs/etc can occur can change multiple times in a day, much to everyone's enjoyment. They're almost all private property, and are so valuable that whatever they want is what they get.
And food. Most/~all? major brands get custom integrations.
Hundreds (iirc) of identity verification providers, most or all custom, and constantly weighed against cost and accuracy because it ain't cheap and it ain't good but it is far better than none (both legally and ethically).
No idea how many payment sources the accept, but it's definitely a lot more than anyone thinks.
And remember that this is all international. So scale is huge and law changes are constant and frequently conflicting. Darn near every useful feature is illegal somewhere, at some time, for both good and bad reasons.
---
This is not at all to say I think Uber is efficient, clearly it is not. Not by an enormous margin. But there is a legitimate need for truly absurd complexity, because the world is not consistent.
TL;DR: Managing a taxi service (that's what Uber is in my mind, not whatever "ride share" means) that spans cities and states, never mind countries, is extremely complicated. To their credit, Uber manages to make it look simple to the end user, prompting such comments as "meh it's just a few screens how hard could it be", which is triumph of product engineering as far as I am concerned.
Related: this blog from Uber talks about the problem of serving market-specific configuration data at scale: https://www.uber.com/us/en/blog/how-we-unified-configuration...
Each country has their own laws around what uber is and isn’t allowed to do. This needs to be formalized in code. For example you actually call a taxi, though the uber app, and the amount you pay is per mile, not a fixed fare decided ahead of time. To add to this complexity, some cities will have their own laws. What happens if you take an uber from town a to b, where each one has different laws ? A lawyer probably has an answer but the app needs to adhere to that. On top of that laws change all the time.
Optimization, well you can always optimize something. speed, costs, paths etc. In a way this never ends.
I think the part we interact with as consumers is a tiny sliver of the complexity those services have to build and operate.
I think this is partly a problem with companies that have had heavy investment. Uber’s value isn’t based on what they are doing, it is based on the idea that they are going to render ideas like owning your own car or taking public transit obsolete (I mean that’s an exaggeration but less of one than it ought to be).
No company with good engineering leadership should act like this is remotely a good idea.
This always happens when the metric becomes the goal, companies should nurture and foster an environment where AI is used in the most efficient way possible, first asking "do we really need an agent for this" and if so, what kind of agent is needed, what model, reasoning level, etc.
They should also promote projects that aim at saving tokens, increasing cache hits, codifying the information in ways such they use as less context as possible (graphs of knowledge are pretty good for this!)
It's like trying to win a race by setting a gas station on fire.
> They should also promote projects that aim at saving tokens, increasing cache hits, codifying the information in ways such they use as less context as possible (graphs of knowledge are pretty good for this!)
My understanding is that most big "tokenmaxxing" companies do have teams who are working on this in the background.
I do not believe that engineers who are tokenmaxxing are truely productive and I have not seen any evidence whatsoever (perhaps the opposite).
I've personally found that with the right flow and codebase knowledge, that's achievable with sustainable levels of effort.
Still very valuable. They just need to have strategies that match what the tools are capable of - not strategies that involve "rub the magic lamp and increase profits 80%".
If the market is rewarding companies going after the "rub the lamp" strategy, they're going to say they're doing that to juice stock prices.
Maybe the market is finally realizing blindly spending billions on LLMs with almost no strategy is not a good strategy.
Who knows.
Wrote about this and the impact of to jobs here: https://x.com/deepwhitman/status/2058324179506831372
I can see how Uber could burn unbelievable amounts of tokens if they start running internal features that run a bunch of prompts against every completed ride, or every customer profile, for example.
Or maybe this is about employee usage, but they introduced some stupid "you get evaluated on how many tokens you used" thing a couple of months ago when that was trendy and are just beginning to notice how much that cost?
The number of product teams who have shipped expensive-to-operate AI features is wayyyy up there, and for many of the scenarios I've seen, customers simply don't care or are unwilling to pay significant premium for access to it.
At the same time I'm starting to see some direction from people in leadership that I should "use the right model for the job" and things along those lines, which is a very, very different line from what I was hearing 12 months ago.
My continued prediction is that we are going to see a tweak on the SaaS model where the sweet spot moves to metered usage pricing of really fine-grained API-based access for apps which traditionally have been operated solely via the UI. Long term the trend is going to be "we'll house the data, enrich it, maintain it, provide fine-grained API access over it tailored to model usage, and you bring the model" with some services opting to give you the model interaction layer/harness. IOW I don't think SaaS is dead. Far from it. However, I do think that a lot of people are going to be looking to interact with SaaS apps via their own models with APIs that support those use cases better than a lot of those APIs do today.
He's saying that like it's some grand epiphany and not the most self-evident, obvious thing I've heard this month. Some of the literal dumbest people on earth are in charge of these major companies.
Imagine if engineers were ranked based on their AWS spend. People allocate VMs and fill databases with terabytes of random bits, to get to the top of the AWS leaderboard. If you don't do this, you're ranked at the bottom, and good luck at the next review cycle. Who could have expected that this is not the road to success?
Anyone who can find the actually valuable portions of the space early has a potentially huge competitive advantage. Even if the result of the experiment is the negative that AI is actually mostly not that useful, that is still extremely useful information in a time of great uncertainty regarding outcomes.
The bottom line is that this approach may be expensive, but if you have the money to burn, it's far from the worst strategy if you are trying to position yourself correctly for the future.
OTOH maybe we’re in for a future of patenting prompts.
The incentive structure of this type of decision is 'absolutely under no circumstances existentially mess up'. Ostensibly with respect to the organisation, but in actual reality much more so with respect to the individual(s) involved in the decision.
If everyone else is doing something that kind of obviously makes no sense, and you decide to break from the crowd by instead doing what does make sense, then there's a pretty solid chance of gaining a temporary edge while reality resolves the truth. But those gains probably won't matter all that much for the organisation, or indeed your position within it. It's a solid chance of an unimportant gain.
However on the other hand, there's a tail risk that something very unexpected happens and the thing everyone's doing that makes no sense actually turns out to make sense - sometimes even for entirely unpredictable incidental reasons - and then, well, you're in trouble. Not necessarily 'you' the organisation.. they'll likely be able to catch up and it won't matter that much. But for 'you' personally, the decision maker, it's very much not good.
As a bonus, in the much more likely scenario that the thing that makes no sense turns out to indeed make no sense, you're in the same boat as everyone else, there's no relative loss, and most importantly you don't stick out as someone who did something as risky as to go against the prevailing, albeit pretty clearly nonsensical, sentiment.
So basically, game theory tells you pretty quickly to just go with the thing that makes no sense if you're optimising for some (weighted) cross of what's best for the organisation and yourself as the decision maker.
We aren’t there yet, so far it is just a COO questioning the investment
But it's not. Some FAANGs are doing amazing things with unlimited tokens. Other companies have no clue what to do with tokens, they've just told their engineers to max them.
It really depends on how you're using the tokens. If you're just using them for Codex and Claude Code - yeah, tokenmaxxing is incredibly dumb.
Giving someone unlimited access to a resources is not the same as directing or incentivizing them to use it for the sake of using it which is what the parent comment criticized.
As for the other FAANGs, Meta and Google have (not good but still) frontier models of their own, so they are very different from a company paying API costs per token.
Unlimited tokens is different from “use AI a lot or we will fire you, and we are counting token consumption as usage”. Obviously the latter is stupid and yet it was done in many places.
Would love to know what things!
AI is an accelerator that engineers should know and have access to, but it's not something that should have mandated usage and quotas around. It's also absolutely dangerous for young engineers and the like - it fundamentally denies you of the "learning" aspect. I'm now seeing in interviews young graduates being given AI tasks to complete and they come back with a correct solution and no concept of how it is working.
You learn and reinforce learning by DOING and reading in depth. High level summaries don't teach anything and are the kinds of things only VPs care about. So, unless the intention in the future is for everyone to be a VP using AI to do the work, we need some middle ground here and some real thought around implementation of these tools or there's going to be a generational canyon gap of knowledge between being able to "say" and being able to "do".
Classic Goodhart’s Law: when a measure becomes a target, it ceases to be a good measure.
I also want to call out the false productivity opportunities AI offers. There are whole teams building their own "gas town" and not shipping features.
Goodhart's law strikes again at someone with enough power to be both ignorant of it and make others suffer their ignorance. You cannot simply measure productivity by tokens spent just like you can't measure it by hours spent in a chair at a desk.
Which is why two identical jobs with the same real life output have drastically different productivity.
A nursing home in Luxembourg has 5 times the productivity of one in Romania despite the services being identical and tech-unrelated.
Of course, the latest DeepSeek models are not as good as Claude, but they're not super far off either.
Gitlab is going to take off? This is not investment advice.
Even acknowledging we don't know exactly what costs would look like in a world without VC money, wouldn't hosting models logically be cheaper to do at scale in a data center?
When I compared to the cost of running DeepSeek locally, I meant that we can treat that cost as a price ceiling, not the floor.
No, I think local stuff using also-useful-for-other-things hardware will vastly undercut cloud hosting when the free money pipeline shuts down, and will stay that way for roughly forever. That doesn't mean cloud stuff isn't useful, clearly it is, but adding another company in the middle is rarely the solution for reducing costs.
Agents are expensive in large part because tool calls require round trips. It's because these APIs are stateless and not streaming so you have to resend the whole context each time. This means you have roughly #tool calls x 1/2 context size cached input tokens over any given session. Most API providers overcharge you by a huge amount for cached tokens. A exception being Deepseek. Paying OpenAI $0.05 for 100k cached GPT5.5 tokens during a possibly 2 second round trip agent tool call is like paying $100/hr for what is likely to be ~10 to 20 GB of VRAM residence (holding the KV cache).
Or it got offloaded to NVME and you are paying $0.05 for that much PCIe bandwidth.
It's especially a crazy assumption to make relative to the costs of employing a human. The costs of paying an entry level employee are unlikely to go down at all, and even if those costs do decline, there's a floor they can't drop below (minimum wage at the extreme end), whereas companies are free to optimize agentic costs as close to zero as possible.
So you are assuming that a cost which is extremely susceptible to optimization but which no one has yet seriously attempted to minimize will remain perpetually above a cost which is much less susceptible to optimization, is already subject to enormous efforts to minimize, and has a legally mandated floor. That seems like a bad bet.
I’ve spent $10-$20 a day using Claude to write code and closer to $5 a day now that I mostly use Deepseek and GLM, using API pricing (no subscriptions) since I don’t use Claude Code.
This is a rounding error for a company. So I think there’s plenty of room to use AI extensively while being more cost-conscious.
I also don't think that blitz scaling will work like with Uber. The engineers are still there. We can work without the LLM tools.
The world will look drastically different 5 years from now; for the better or worse, so save every penny (especially if you work in tech).
I'd imagine GPT-5.5 and Claude Opus 4.7 could run just fine on a 16x H200 node and serve at least 10 heavy users without the token output getting choppy.
The financials don’t make sense now. Based on the expenditure the finances won’t ever make sense.
The former is the issue, and how many companies have been operating. It's like a trucking company ranking driver effectiveness by fuel used instead of by cargo moved.
But on a more serious note, do we know how much Uber spent per technical employee/month? I assume it is far more than even any of those $200 "max ai" plans.
And the other question is how much the public would be willing to spend, in my estimation this is as "cheap" as it will ever get (main-stream at least).
Am in a random small company, colleague spent 100 EUR a day on Sonnet through AWS Bedrock (needed to use a EU region). Paying for tokens will get you in a deep hole financially compared to any of the subscriptions, unless it's like DeepSeek or one of the other models that are priced a bit better, though that's also a tradeoff in what they can/cannot do and also where the data goes. Ended up trying out the Mistral subscription for the US stuff btw, it was fine.
> Adoption climbed from 32 percent of engineers in February to 84 percent classified as agentic coding users by March. By spring, 95 percent of Uber engineers used artificial intelligence tools monthly, and roughly 70 percent of committed code originated from those tools. About 11 percent of live backend updates were written by agents with no human in the loop, according to Uber's own disclosures.
> The numbers behind the spend are what make the story instructive rather than anecdotal. Monthly cost per engineer ranged from $150 to $250 on average, with power users running between $500 and $2,000.
My guess is that the reason to rethink AI-spend was probably the exponential growth in cost over time, and tokenmaxxing payoff not being immediately obvious as mentioned in the article.
[1] https://www.forbes.com/sites/janakirammsv/2026/05/17/uber-bu...
Adds nothing insightful to these discussions.