* > ...*
* > Gemini 2.5 Flash Lite Preview 06-17 and Claude 3 Opus: 78.2%*
As someone who has tried to use many of these models for writing assistance, you're very wrong here. It really matters whether the model can get what I'm trying to communicate well enough to be helpful, or else I'll just write it myself. If you actually play with them a bit it's very clear these models are not substitutes. This goes for many on your list!
(There was a blind test in Wine Enthusiasist magazine - even sommeliers didn’t recognize expensive wines from cheaper alternatives.)
But ofc if you get perfect results in one shot from expensive model, it is cheaper than wrangling with cheap model for an hour…(just an example).
But what I see hard is to navigating so many models available - HuggingFace has 2,769,687 models listed…
So every comparison like this or at models.dev or arena.ai is good.
How the hell are companies and individuals not taking reputational hits for saying blatantly wrong things in AI-voice, under their name?
You can see who likely (post)trained/distilled their models or borrowed parameters from each other. I do wonder if the 32 dimensions were chosen/named from principal components or pre-selected and designed, but the tool seems like an effective discriminator in any case.
Were the prompts similarly selected for orthogonality? I've wondered how the different LLMs would respond from iterative zero-shot prompt_n generation by summary from a previous response_n to generate zero-shot response_n+1. Would it statistically converge to a more distinguishable prompt for that LLM?
I expected it to be an analysis of AI-generated writing styles. Not full of them.
;)
our community is shooting a gun in its own feet if it continues to upvote complete self promotion, I try to upvote as many cool projects I find here but the number games is definitely frustrating
Are we entirely sure that this person hasn't used AI bot to upvote his comments to front page, I wish to much rather believe that than people upvoting it especially when most if not all comments are about how it feels extremely AI slop.
maybe such forms of (rage/click bait?) truly sells in that regards and HN isn't so invulnerable as (I) we think it is.
- seem interesting but aren’t getting much traction
- are from users who actively participate in the community
Low quality post imo.
*Generated I assume.
Is this article AI slop?
Many of those numbers do not really match what I've seen in the wild, and without clear illustration why you arrived at the number it's not a helpful number.
I hate to say it, but Gemini lies less frequently than paid models from OPenAI and Anthropic (Open AI is worst in my use cases).
My guess is that Google has better training data (and uses less synthetic data which might be creating training feedback loops in other models), has more of a "be calibrated" model than a "be helpful" model, but it could just be that they leverage more RAG than leveraging weights more.
But, I really shouldn't speculate the "why" as I'm out of my domain. Just curious if others use all the models they can and compare outputs as much as I do.