In the past month, OpenAI has released for codex users:
- subagents support
- a better multi agent interface (codex app)
- 40% faster inference
No joke, with the first two my productivity is already up like 3x. I am so stoked to try this out.
Whereas Claude Max 5x is enough that I don’t really run out with my usage patterns.
Then from that they realized they could just run API calls more like staff, fast, not at capacity.
Then they leave the billion other people's calls at remaining capacity.
https://thezvi.substack.com/i/185423735/choose-your-fighter
> Ohqay: Do you get faster speeds on your work account?
> roon: yea it’s super fast bc im sure we’re not running internal deployment at full load
Starting off the ChatGPT Plus web users with the Pro model, then later swapping it for the Standard model -- would meet the claims of model behavior consistency, while still qualifying as shenanigans.
In this particular case, I'm happy to report that the speedup is time per token, so it's not a gimmick from outputting fewer tokens at lower reasoning effort. Model weights and quality remain the same.
Happy to retract if you can state [0] is false.
its worth you guys doing on your end, some analysis of why customers are getting worse results a week or two later, and putting out some guidelines about what context is poisonous and the like
I.e.: "keep API model behavior constant" says nothing about the consumer ChatGPT web app, mobile apps, third-party integrations, etc.
Similarly, it might mean very specifically that a "certain model timestamp" remains constant but the generic "-latest" or whatever model name auto-updates "for your convenience" to the new faster performance achieved through quantisation or reduced thinking time.
You might be telling the full, unvarnished truth, but after many similar claims from OpenAI that turned out to be only technically true, I remain sceptical.