12 pointsby nsoonhui6 hours ago8 comments
  • pllbnk2 hours ago
    Depends on what is a ‘generation’ for LLMs. It would be weird to build a model which is a generation behind. My guess is that like all models, it will be considered the best until the novelty factor wears off and then it will be more or less the same like all modern LLMs - better in some domains, worse in others.

    Edit: and it will probably also lead in most major benchmarks which says next to nothing about the quality.

  • fastThinking6 hours ago
    Being ahead of Google is less about raw model quality and more about shipping usable products fast. Anthropic’s advantage seems organizational as much as technical. If Sonnet 5 really halves inference cost while improving reasoning, that’s more disruptive than any benchmark win.
  • thomasfromcdnjs4 hours ago
    I keep trying to use Codex CLI but I love using claude --dangerously-skip-permissions but this seems impossible to do in codex, and it just asks me to approve every command per session. Am I taking crazy pills or is there a way to make codex just run in yolo mode?
    • lostmsu3 hours ago
      --yolo

      could find in --help

  • touwer4 hours ago
    The article itself seems to be written using an llm from 1950
  • RivieraKid3 hours ago
    Aren't people worried about their jobs? I'm surprised that this aspect is almost entirely missing in threads like this.
    • keyle3 hours ago
      Say, you can vibe design your next house.

      Would you want that?

      Isn't a house too personal that you'd want to get a professional architect with experience to design it, and sign off on it? Even if they used advanced tools like CAD and copy pastes 8/10 of it?

      Sure, you can probably one shot notepad.exe but it has no meaning. Meaningful work isn't going anywhere, for the reason that meaningful work lives and lives on by people for people.

      No one wants a vibe designed car, unless you are one of those psychos that has no tastes and doesn't care about anything.

      • thedevilslawyer2 hours ago
        Have you worked with a professional architect. Cost adds up fast, and you get 1-2 iterations?

        I'd love to work and vibecode the house to my full liking, assuming that the agent harness will take care of all the nonfunctional things (stable design, zoning etc). Same for car if I could customize it I would.

        (I definitely don't like the ramifications of it on the economy/jobs, but the above are pure consumer wins, no doubt)

    • zhshsha2 hours ago
      Have you actually used these tools?

      My CTO is pushing 30k line PRs and when asked “how do you know it works” all he can say is “I’m not sure but it probably does. Our customers can QA”. Meanwhile I’m cleaning up half vibed messes from my coworkers that demo’d well.

      They’re very powerful, but I think their marketing departments are even more powerful. I do wonder how many of these comments are real people.

      • RivieraKid41 minutes ago
        Not much, only the non-paid non-agent stuff. It's pretty impressive but my estimate is at best a 2x productivity jump for general use.

        My worry is that the agentic stuff is reportedly a significant improvement and getting better quickly.

    • spants3 hours ago
      control ai or be controlled by it.

      Learn everything that you can about AI and you will be a great resource. Otherwise, learn a trade. Electricians will be required...........

      • RivieraKid3 hours ago
        The percentage of people employed in agriculture dropped from 80% to 2%. The market will be full of people who are willing to learn everything about AI in order to have a comfortable and highly paid job.

        Becoming an electrician would be a downgrade or even impossible for some people.

        For the record, I think AI replacing highly paid "sitting behind a computer" jobs would be good for the society, but probably not for most people having these jobs.

  • Havoc4 hours ago
    It’s definitely going to be a busy month in model land. Loads of new stuff is scheduled to drop.

    I think it’s premature to say what’s going to beat what though

  • tajd4 hours ago
    what are the key references for this article? there was a tweet but also a screenshot of an error code in vertex ai, right?
  • solumunus5 hours ago
    How long is a generation with LLM’s, 6 months?
    • column4 hours ago
      In 2022 midjourney's CEO said anything they release now would be obsolete in 6 months time. That seemed wild, but he was right.