One thing for sure is that while Claude is currently taking the #1 spot in mentions, it carries a lot of negative sentiment due to API pricing policies and frequent server downtime. On the other hand, the runner-up, GPT-5.5, actually seems to have more positive feedback.
Personally, my experience with Codex wasn't as good as with Claude Code (Codex freezes on Windows more often than you'd expect), so this is a bit surprising. That said, the more defensive GPT is definitely better in terms of sheer code-writing capability. However, GPT actually has quite a few issues with text corruption when generating in Korean or Chinese—something English-speaking users probably don't notice. In terms of model capabilities, when given the same agent.md (CLAUDE.md) file, I think GPT is better at writing code, while Claude is better at writing text during code reviews.
Looking at the bottom right, Qwen and DeepSeek are open-source, so they are largely mentioned in the context of guarding against vendor lock-in, which drives positive sentiment. Considering that Hacker News occasionally shows negative sentiment toward China, the fact that they are viewed this positively—unlike US models—shows that being open-source is a massive advantage in itself.
Anyway, one thing for sure is that Gemini is pretty much unusable.
They are cheaper! All signals point to them staying cheaper because they are built more sustainably. Also, some of the latest entries can run on 1 GPU! Literally available at your desktop where there can be no service interruptions. Not even network latency. People are one and few shotting little games for 0 dollars because they bought a GPU to play video games this year. To me that's an unbeatable value. Once the tooling catches up and a few more model releases, it could change everything completely.
Of course, when I tried it on something else it rewrote every line in the file for no good reason, applied changes directly when I told it just to plan, etc.
So maybe it has one strength.
Ha! I find that Gemini is quite useful - if only because I am forced to use it (on my personal projects) because it's the only one that has unlimited interaction for "free"
It has its limitations, yes, but so does Claude (which I am leaning on too heavily at work at the moment)
I am upset because now anthropic, openai, meta, etc will continue their smear campaigns here. But I am also happy because it will make HN less useful when they do.
Everything is a give and take I guess. Excited to see where the equilibrium sits
What I want is more fully open models where everything is shared. Data, training algorithms, weights. That way we can figure out if we should trust it.
I think it's also unfair to say their success is solely due to stealing data. They are contributing a lot of advances to the literature about what they are doing. The proof is in the results we have 27b models you can vibe code with. Not 1t+
It's murky sure. But there are smear campaigns about how people can't trust China too. There's some truth to that too but we can't trust the US either so local models are an interesting way for China to offer us some level of sovereignty.
The context would be really nice to have, but reading the comments myself, it often just isn't very clear what exactly users are building or which programming language they are using.
I think analyzing more comments is promising. If you get enough data, you can generalize across use cases and get more meaningful ratings. The obvious lever is including more posts, although it might hit diminishing returns. I'll play around with it.
For the context, I want to try giving Gemini a "scratch pad", where it can note down strengths and weaknesses per model that it finds in the comments. Something like "some users say that model x is good for writing tests". Then on each run, I let it update the scratch pad and publish the results as more of a qualitative analysis.
For the wording, I'd like to keep a certain amount of click bait, sorry ;)
It's way too important a piece of information not to have it visible.
And it's probably a good idea to create a list of model release dates, so older comments can't accidentally map to models that weren't released yet.
I saw you're using Gemini for the sentiment rating (which I guess you picked because it's not often mentioned and thus "neutral"? lol)
But would be interesting to get more details overall
The technical abilities and usage are derived from the commenters usage reflections.