For comparison, here's GPT-5-Codex (not mini) https://static.simonwillison.net/static/2025/codex-hacking-d... and full GPT-5: https://static.simonwillison.net/static/2025/codex-hacking-g...
I had quite a fun time getting those pelicans though... since GPT-5 Codex Mini isn't officially available via API yet I instead had OpenAI's Codex CLI tool extend itself (in Rust) to add a "codex prompt ..." tool which uses their existing custom auth scheme and backend API, then used that to generate the pelicans. Full details here: https://simonwillison.net/2025/Nov/9/gpt-5-codex-mini/
I'm half way through writing a typescript to native code translator (via .Net) compiling a large enough subset of current code with a lot of help from GPT5 and Codex CLI. It has completely blown me away.
I'd like to give you a concrete example which stood out (from by now, dozens). I wanted d.ts files from the .Net Standard Libs. One immediately obvious problem is that .Net would allow classes/interfaces to be redefined if the generic type arity is different. For example, there can be SomeClass<int> and SomeClass<int, int> which are completely separate. TypeScript of course, wouldn't allow this - you can have one with all types defined, but it'd obviously be a mess.
I was stuck with (quite ugly): const users = new List_1<User>(...); instead of const users = new List<User>(...);
So GPT comes up with this:
declare const __unspecified: unique symbol;
type __ = typeof __unspecified;
// Your arity-anchored delegates exist elsewhere:
// import("internal/System").Action_0
// import("internal/System").Action_1<T1>
// import("internal/System").Action_2<T1, T2>
// import("internal/System").Action_3<T1, T2, T3>
// ... up to 17
export type Action<
T1 = __, T2 = __, T3 = __, // ... continue through T17 = __
> =
[T1] extends [__] ? import("internal/System").Action_0 :
[T2] extends [__] ? import("internal/System").Action_1<T1> :
[T3] extends [__] ? import("internal/System").Action_2<T1, T2> :
/* next lines follow the same pattern … */
import("internal/System").Action_3<T1, T2, T3>;
This lets me write: const a: Action<number> = (n) => {}; // OK (void)
const f: Func<number, string> = (s) => 20; // OK (string -> number)
A human could come up with this, of course. But doing this at scale (there are many such problems which crop up), would take a lot of effort. Btw I'm using Claude for the grunt work (because its faster), but GPT5 is doing all the architecture/thinking/planning/review.It is just poor at designing a generic solution despite repeated requests to follow the design of existing alternatives (present in the same repro). It tended to plug holes in a broken architecture it came up with on its own instead of redesigning or trying to simplify its code to be able to keep it in its own head. TBH I suspect this might be limited purely by context length.
It produced fine(-ish) initial bits so a few tests would pass, but it dug itself a hole of introducing provenance and could not keep track of it properly. You can see it: https://github.com/lostmsu/ILGPU/tree/Vulkan-GPT-5-Stuck
TBH2: this was a huge request. But also there are already other backends it could just mirror.
Are you saying ternary chains using sentinels for arity inference is pretty common? I would disagree.
> since it’s the main control flow
Perhaps you're saying ternery chains are common in TS code? That's a very different thing though - the code above is not for runtime behavior.
Why would Range need a sentinel?
My point is that using a sentinel to bridge TypeScript's lack of generic arity-based specialization is a non-trivial problem. After you mentioned it, I looked for examples on Google and couldn't find anything that matches precisely.
I'm not claiming humans can't solve this, or that gpt5 invented something fundamentally new. My original point was about productivity at scale. Having a model apply the right solution across dozens of similar problems, rather than me manually figuring out each one.
Grok's latest update made it far worse than the version right after the Grok-4 release. It makes outright mistakes now. Copilot has cut corners long ago. Google "AI" was always horrible.
The whole "AI" experiment was an outrageously expensive IP laundering parlor trick that is meeting economic realities now.
> Claude Code is reportedly close to generating $1 billion in annualized revenue, up from about $400 million in July.
https://techcrunch.com/2025/11/04/anthropic-expects-b2b-dema...
As soon as users are confronted with their true API cost, the appearance of this being a good business falls apart. At the end of the day, there is no moat around large language models - OpenAI, Anthropic, Google, DeepSeek, Alibaba, Moonshot... any company can make a SOTA model if they wish, so in the long run it's guaranteed to be a race to the bottom where nobody can turn a profit.
Where are you getting that number from?
Anthropic added quite strict limits on usage - visible from the /usage method inside Claude Code. I would be surprised if those limits turn out to still result in expensive losses for them.
My theory is this:
- we know from benchmarks that open-weight models like Deepseek R1 and Kimi K2's capabilities are not far behind SOTA GPT/Claude
- open-weight API pricing (e.g. on openrouter) is roughly 1/10~1/5 that of GPT/Claude
- users can more or less choose to hook their agent CLI/IDEs to either closed or open models
If these points are true, then the only reason people are primarily on CC & Codex plans is because they are subsidized by at least 5~10x. When confronted with true costs, users will quickly switch to the lowest inference cost vendor, and we get perfect competition + zero margin for all vendors.
That sounds reasonable given that 10% of software developers are talkers that need someone to output something that looks like a deliverable.
We were however talking profits here, not revenue.
That $1bn number was in a paywalled Information article which was then re-reported by TechCrunch so the actual source of the number isn't clear. I'm assuming someone leaked to the Information, they appear to have some very useful sources.
I doubt this is just US developers - they've boasted about how successful they are in Europe recently too:
> Businesses across Europe are trusting Claude with their most important work. As a result, EMEA has become our fastest-growing region, with a run-rate revenue that has grown more than 9x in the past year.
https://www.anthropic.com/news/new-offices-in-paris-and-muni...
I feel the same way about Simon Willison. He's a treasure!
Here's a few of those EF Hutton commercials for your viewing pleasure--
Yep, that's how I feel when Simon Willison speaks!
LLMs are advertised for serious applications. I don't recall that CPUs generally hallucinate except for the FDIV bug. Or that AirBnB rents you apartments that don't exist in 30% of all cases. Or that Uber cars drive into a river during 20% of all rides.
"CPUs don't hallucinate" would be a reasonable argument if CPUs were an alternative to LLMs, which they aren't, so I'm not really sure what argument you're making there.
Seems like you're saying "a calculator makes fewer mistakes than an accountant", which is true, but I still pay an accountant to do my taxes, and not a calculator.
Thinking ...
- The user is asking about the connection between CPU bugs and price dumping in order to capture market share.
- The user appears to miss the original thread starter that mentions cutting corners in models after the subsidy phase is over.
- The mention of CPUs, AirBnB and Uber appear to be examples where certain quality standards were usually kept even after the subsidy phase.
Generating response ...
- set temp to 0
- be more specific
But I'd argue that if your LLM isn't hallucinating, then it's useless
I agree that many new model versions are worse than the previous. But it is also related to base rules of the model - they try to please you and manipulate you to like them, way too much.
> GPT-5-Codex-Mini allows roughly 4x more usage than GPT-5-Codex, at a slight capability tradeoff due to the more compact model.
> Available in the CLI and IDE extension when you sign in with ChatGPT, with API support coming soon.