69 pointsby nuancedev6 hours ago14 comments
  • jefftk6 hours ago
    > "Models with >75% writing similarity but massive price gaps. The cheap model writes the same way. You are paying for the brand.

    * > ...*

    * > Gemini 2.5 Flash Lite Preview 06-17 and Claude 3 Opus: 78.2%*

    As someone who has tried to use many of these models for writing assistance, you're very wrong here. It really matters whether the model can get what I'm trying to communicate well enough to be helpful, or else I'll just write it myself. If you actually play with them a bit it's very clear these models are not substitutes. This goes for many on your list!

    • sixtyj4 minutes ago
      Models have their “personalities” for sure but that expensive model is better is maybe just a confirmation bias.

      (There was a blind test in Wine Enthusiasist magazine - even sommeliers didn’t recognize expensive wines from cheaper alternatives.)

      But ofc if you get perfect results in one shot from expensive model, it is cheaper than wrangling with cheap model for an hour…(just an example).

      But what I see hard is to navigating so many models available - HuggingFace has 2,769,687 models listed…

      So every comparison like this or at models.dev or arena.ai is good.

    • lubujackson6 hours ago
      It makes sense. The cheaper models are often distilled versions, so they may ape language but miss the connective tissue that makes the entire output coherent.
    • rogerrogerr6 hours ago
      I'd bet this whole thing is vibe'd out of nothingness and no human actually thought about whether saying "you are paying for the brand" makes any sense at all.

      How the hell are companies and individuals not taking reputational hits for saying blatantly wrong things in AI-voice, under their name?

      • anonzzzies5 hours ago
        Also are Gemini and Opus not both big brands? If it was some small ai shop vs opus then sure. So seems indeed to make little sense?
    • Netcob3 hours ago
      Also, is it "paying for the brand" or "paying for the training"?
  • kurthr5 hours ago
    It would be shocking to me if the large model trainers didn't have tools like this to analyze their outputs, but this is interesting work!

    You can see who likely (post)trained/distilled their models or borrowed parameters from each other. I do wonder if the 32 dimensions were chosen/named from principal components or pre-selected and designed, but the tool seems like an effective discriminator in any case.

    Were the prompts similarly selected for orthogonality? I've wondered how the different LLMs would respond from iterative zero-shot prompt_n generation by summary from a previous response_n to generate zero-shot response_n+1. Would it statistically converge to a more distinguishable prompt for that LLM?

  • leonidasv6 hours ago
    I've always wondered if the "typical" AI writing style is just an unavoidable RL artifact or a deliberate fingerprint to prevent model collapse as low-effort AI-generated text floods the training data pool (the web).
  • qaid5 hours ago
    Ugh. subheadings were a major turn off.

    I expected it to be an analysis of AI-generated writing styles. Not full of them.

    ;)

    • add-sub-mul-div5 hours ago
      It's a spam account, like nearly all submissions about AI their account is just self promotion history. Together with the subject matter I'd expect nothing but lazy bullshit.
      • Imustaskforhelp5 hours ago
        One thing I don't understand then is why Hackernews upvotes these posts when there are some other SHOW HN posts which sometimes are so much nicer and thoughtful like (https://news.ycombinator.com/item?id=47589735) etc.

        our community is shooting a gun in its own feet if it continues to upvote complete self promotion, I try to upvote as many cool projects I find here but the number games is definitely frustrating

        Are we entirely sure that this person hasn't used AI bot to upvote his comments to front page, I wish to much rather believe that than people upvoting it especially when most if not all comments are about how it feels extremely AI slop.

        maybe such forms of (rage/click bait?) truly sells in that regards and HN isn't so invulnerable as (I) we think it is.

        • vunderba2 hours ago
          At least once a week, I take a few minutes out of my day to look through `/shownew` to comment on and upvote posts that:

          - seem interesting but aren’t getting much traction

          - are from users who actively participate in the community

  • sensarts3 hours ago
    This is really cool. That's the good stuff. Did you notice any pattern in why models cluster? Shared training data or just similar architecture choices?
  • emaro4 hours ago
    No mention of any linguistic theory, some arbitrary (?*) metrics mixed together and even more arbitrary thresholds. Why does 75% "similarity" mean "writes the same"?

    Low quality post imo.

    *Generated I assume.

  • docheinestages5 hours ago
    The muted colors on a dark background makes everything hard to read.
    • vunderba3 hours ago
      This is the stock CC vibe-coded aesthetic - it loves to give the middle finger to even the most basic of web accessibility requirements like contrast ratios.
  • a9602064 hours ago
    Amazing,last time I let GPT guess Claude content,it guess GPT made it
  • redox996 hours ago
    Besides claiming opus and gemini flash share 99% of style being suspicious, the point that you are wasting money on the expensive model is non sensical. You pay primarily for the intelligence, not the writing style.

    Is this article AI slop?

  • groby_b5 hours ago
    Without showing the prompts and responses, it's yet another meaningless AI benchmark.

    Many of those numbers do not really match what I've seen in the wild, and without clear illustration why you arrived at the number it's not a helpful number.

  • glaslong5 hours ago
    I'm curious about the sorts of users who care about style but will either one-shot with default style, not providing samples or direction, or who even choose models on that style rather than, you know, substance.
  • apercu5 hours ago
    Has anyone else used LLMs to fact check other LLMS?

    I hate to say it, but Gemini lies less frequently than paid models from OPenAI and Anthropic (Open AI is worst in my use cases).

    My guess is that Google has better training data (and uses less synthetic data which might be creating training feedback loops in other models), has more of a "be calibrated" model than a "be helpful" model, but it could just be that they leverage more RAG than leveraging weights more.

    But, I really shouldn't speculate the "why" as I'm out of my domain. Just curious if others use all the models they can and compare outputs as much as I do.

  • agomezc015 hours ago
    [dead]
  • rpdaiml6 hours ago
    [dead]