He did clarify that it was with fast mode. Without fast mode it'd "only" be $300k in raw API cost, or ~60 $200 Codex subscriptions.
Eventually Codex's subscription subsidization will diminish to near-zero, like the rest of the providers.
It's extremely important that people understand how expensive these models currently are. Even $300k in raw API costs is alarming for the output.
He has agents write shitty code for features other agents think other people want, then has it reviewed by other agents in hopes of catching bugs that the first agent put there, then has some more agents try to find security bugs in the now double-agented code to make it triple-agented and at the end of the day, he spent a shitton of tokens, probably emitted enough carbon to heat our planet by another degree, and has a feature nobody really asked for that might or might not work.
He then has the sense of humor to call this grotesque process "incredibly lean".
What's the point in all of this? What problems is this solving? Who's benefiting?
The morality issues about consumption climate impacts are not his alone, and are not unique by itself to his endeavor. Every company with an enterprise LLM agreement has a share, for instance.
“He has /people/ write shitty code for features other /people/ think other people want, then has it reviewed by other /people/ in hopes of catching bugs that the first /people/ put there, then has some more /people/ try to find security bugs in the now /double-peopled/ code to make it /triple-peopled/ and at the end of the day, he spent a shitton of /money, the people/ probably emitted enough carbon to heat our planet by another degree, and has a feature nobody really asked for that might or might not work.”
Honestly sounds like a normal tech company to me. Just with much dumber “people” who are getting exponentially smarter, eventually never die, eventually never forget.
You have to skate to where the puck is going, not where it is.
> What's the point in all of this? What problems is this solving? Who's benefiting?
The economy doesn't work like how you think it does. Its not central planning. All the usages aren't detailed in a specification, submitted for approval to 100 agencies and then allowed to be used.
It shows lack of intellectual curiosity to not engage deeply with obviously profound technology and what the implications are. I find this exercise helpful.
Peter is predicting how LLMs will be used in the future when the prices go down. And they will definitely go down. I think his predictions are correct and we will definitely have something similar to OpenClaw.
I didn't know that studying photocopiers is suddenly linked to "intellectual curiosity". Being a photocopier maintenance guy was always considered boring.
What you put on top of the machine was intellectually interesting.
I'm aware. That is in fact my central critique. The way it works is incredibly wasteful of our limited resources, as illustrated by this guy burning through fuel during a time of crisis for no perceptible gain.
Having said that, we ought just downvote and move on.
Because it does not say “equivalent of”, it literally says he spent money that he did not spend
What I mean by this:
1. Intern, analyst, junior, or offshore level coding is cheaper when done by the machine.
// Side note: There is good reason the industry invests in suboptimal output from this set which moves to the "cost" column when using an LLM, but nobody's accounting for that.
2. For the interns, analysts, junior, or offshoring to do the right thing costs a multiple of the coding effort: the PdM/PjM stuff of course, but also the Stakeholder, Product Owner, Architect, Principal Engineer, QA, and SRE stuff.
3. If you are not a principal or staff engineer level engineer, you are likely unqualified to catch and fix the errors LLMs make across engineering, much less these other PDLC (product development lifecycle, which includes SDLC and SRE) loop.
4. For LLM output to be useful, your 'harness' has to incorporate all of that as well, which because it's so much harder than transliterating spec-to-code, balloons tokens exponentially.
5. Today it is faster, more efficient, and costs less, to work with LLMs "XP" (eXtreme Programming) style, pairing with the LLM actively co-creating and co-reviewing, steering for more effective turns.
So, your options are:
- ship garbage while costing less than a median first world SWE
- pair with the LLM actively for the benefits of XP
- add enough harness and steering the LLM costs more than SWEs, and still needs a human loop “move fast and break things to find out what's broken” style
I would expect that within a couple years, these other disciplines can be baked in enough the machine costs less for everything but surprises.
They already are. I’m successfully using frameworks like bmad to deliver complex apps at that level. My job is to manager the see, as, ux, sre processes and catch errors.
I spend more time refinding prd , epics and stories than I do elbows deep in code.
If I don’t like the output of a story I nuke it change the story and have the flanker try again. I’m using the open source glm, kimi, deepseek models. I expect the full pipeline to be good enough by the end of the year.
They literally are. (If by "all this" you mean the subscription future bait-and-switch plans.)
Lets say I was at the casino and was spending a lot on casino chips but I also happen to work at the casino. I'm not really losing money whether if I win / lose since I'm using the houses money and there's little risk involved on every dice roll or press of the button. The risk is far higher if I don't have that level of access and continue to spend the same amount of money on lots of tokens (or casino chips, spins or button presses.)
The same is true here with these agents. Some companies will realize that they can no longer afford to spend millions a month on tokens or even startups spending $5k - $6k per person per month on tokens.
I can only see local efficient models making sense on recovering from this unnecessary spending or even light gambling on tokens.
Doubtful lol, dudes killing the environment just for fun at this point.
He was. When it comes to marketing. This is was most people don't understand. Peter is a great marketing guy who got hired because of a hype vision, not because he is an outstanding engineer. Think of it like OpenAI hiring MrBeast of the coding world.
Now let's wait until the moderators clean up the wrongthink. He also has censors on his side.
We really need better standards for disagreement.
Opencode has the same problems. They often do multiple releases of that app a day, yet within the span of a week or two I have had to update my config because some random change has altered the behaviour and my permissions broke. Or I've noticed the way the app renders is suddenly different.
Yet, my day to day usage has barely changed since the version I installed last year. It's like everything changes but nothing changes.
All projects can become fast if they drop guardrails.
This does not correlate with productivity increase
That doesn't sound very positive to me...
I just checked the code and feature outputs, and I can build all that in 15 days, for 1.3M USD. Fuck I would do it for 1M...
Scratch that, if it's 300K then sure I could do the same too, if you paid me that for 30 days of work. Lmao, the quality and the feature volume is just not worth anything worth paying so much money for.
I am not saying this because I don't like LLMs or I may think that AI coding can't work, but folks whatever openclaw has built for that much money is not worth nearly that much money...
The hard part is not building such toys, it's the convincing people with money to buy said toy. This is where he earned his applause.
I won’t lie, if I had the access to this, I’d do the same exact thing.
He has a different opinion of what it means to be lean than almost everyone else. That's fine, he's allowed to, but it's something you have to understand to make sense of any of his comments on things. He has a radically different set of values to most people.
Do existing companies run entire end-to-end product integration tests on every single change they make to a repo to make sure something hasn't broken? No, they just architect things in a way such that a minor change to something can be tested in isolation. And that can be automated, deterministically and efficiently.
Where I work we can release changes to our production site in minutes almost completely autonomously with high confidence with absolutely zero AI agents in the loop. How did we do it? With lessons learned from the past 5 decades of professional software development experience.
Lets not forget what OpenClaw is at it's core. It's a glorified cron scheduler. Why on earth does any of this effort need to exist. It's not that deep, it's not that complex, it's all AI for AI's sake.
I run it in a firewalled VM and am very conscious about any tokens I give it access to - so far for all I know this was unnecessary.
PS. for me the core feature of OpenClaw isn't the cron, though that is nice. It's the memory and instant extensibility. Like it takes 5-15 minutes to add an SSH tool where all agent requests go through a manual review, together with a good auto loaded description that just works in all future sessions.
Btw, same frustration for me setting up signal, Whatsapp or slack...
We know it’s totally stupid, but unfortunately tokenmaxxing is real. I know our management line isn’t that dumb, but this is what you get when the business is selling it.
One person using 600B tokens in a month. The most I’ve hit is around 500M tokens and I thought that was a huge amount.
We’re going to have some major compute shortages for a while
I use more than 150B/month with just 15 codex accounts.
60 accounts is "just" $12,000/month. So Peter could "save" 100x by using monthly accounts.
Of course, he doesn't have to, as he works at OpenAI now.
For me it's not even a "what the hell are you working on" so much as complete inability to understand how you can keep so many different processes working on distinct tasks. It simply doesn't map on to how I use these tools.
I spend most of my day writing extremely detailed prompts and that's how I'm able to get the sort of excellent results that confound skeptics. But I have to be honest with you: I don't think I can write (or think) fast enough to do two of these at a time, much less 15.
I definitely could not review what they are generating with any degree of confidence.
I'm really hoping you can explain what the heck your usage pattern actually looks like, because reading this makes me feel like I'm missing something.
Building compilers has a _lot_ of parallel tasks agents can work on.
Wish me luck..
Privacy: Reuses existing provider sessions — OAuth, device flow, API keys, browser cookies, local files — so no passwords are stored.
macOS permissions: Full Disk Access for Safari cookies, Keychain access for cookie decryption and OAuth flows...
It's excellent this is disclosed as a reminder of how things work and the tradeoffs you're making to use it.
[0] https://github.com/steipete/CodexBar
However, I do not see a strong reason to believe that this is his actual, personal usage. It could be all openclaw usage or some subset of openai usage, given that he is inside them. I suspect it is far more likely to be fake data [1] that exercises the graph library in a visually satisfying way. Notice that it has no usage for a 'week' after April 15 (a Wednesday), but picks up a bunch later. As marketing copy it needn't have any basis in reality [2]. I should hope openai would put a procedure in front of their entrepreneur acquisition that prevents accidentally exposing trade secrets [3].
[1] https://github.com/faker-js/faker
[2] https://www.reddit.com/r/proceduralgeneration/comments/lf2n4...
[3] https://tvtropes.org/pmwiki/pmwiki.php/Main/PostingWhatYouSh...
I’d actually seen the original DB episode years before when it first aired and it definitely had an affect on me through this form of manipulation - it altered my internal understanding of marketing/advertising, which was the actual underlying purpose of the episode.
It’s altered how I internally accept and process information from any 2nd or 3rd hand source. BTW, people aren’t necessarily always aware they’re doing it. We all suffer from our own internal biases and deceptions, and sometimes we spread them unknowingly!
i built my personal app mostly with ollama and it’s been smooth sailing so far. basically openclaw + hermes-style agents running on android phones, and the stuff it can do is kinda insane
Just last week I saw a dude boasting about how they used their $20/month ChatGPT subscription to earn $15 (or similar trivial amount) in a bug bounty by running the model the whole day. Sam Altman replied to that tweet but not entirely positively.
OpenAI has been removing limits on token usage to take on Anthropic but I'm sure most of the users they are acquiring are these AI bros who are burning tokens for the sake of it. Massive price hikes are coming after OpenAI and Anthropic IPOs probably an order of magnitude larger than what happened to ride sharing.
Grifters gonna grift. What a state of affairs.
Hopefully eventually we will go back to evaluating the output. Not that I am very hopeful that we learn to do it in sensible way.
And of course I'm just yet another envious hater from "the orange website". Your conscience is clear, AI bros. /s
By which metrics?
> This isn’t clowning.
Why?
Because a solo dev has deployed to millions of people in less than eight months spending I believe zero dollars on marketing.
We should all be so lucky to clown at this scale.