37 pointsby imichael9 days ago9 comments
  • nmca9 days ago
    The linked USAMO math results are in an exam that requires proofs. The same authors, on the same website, ran AIME 2025 shortly after it happened and found it was totally consistent with the o1 announcement numbers; the difference being that the AIME requires only short answers and no proof.

    If you are a skilled mathematician, it is quite easy to verify both that (as of 7th April) models excel at novel calculations on held out problems and mostly shit the bed when asked for proofs.

    Gary cites these USAMO as evidence of contamination influencing benchmark results, but that view is not consistent with strong performance of the models on clearly held out tasks (arc test, AIME 25, HMMT 25, etc etc).

    If you really care, you can test this by inventing problems! It is a very very verifiable claim about the world.

    In any case, this is not the pundit you want. There are many ways to make a bear case that are much saner than this.

  • tptacek9 days ago
    Does any of this matter if you're a person that thinks "AGI" is a silly concept, and just uses these tools for what they're good at currently?

    I'm not trying to be snarky, I'm just wondering why I would care that a tech giant has failed to cross the "GPT-5" threshold. What's the significance of that to an ordinary user?

    • faizshah9 days ago
      Yes, the quality of the models is increasing at a slower rate and the race will transition to performance and efficiency.

      This is good for self hosters and devs who will be able to run near SOTA models like QwQ locally. I’m near the point where I’m going to cancel my ChatGPT Plus and Claude subscription.

      If you’re not already trying to self host, build your own local agents and build your own MCPs/Tools I would encourage you to try it (simple stack: ollama, pydanticAI, fastmcp, QwQ 32B, Llama 3.2 3B). If you don’t have a fancy GPU or M1+ try out QwQ on Groq or Flash 2.0 Lite with the Gemini API, it’s super cheap and fast and they are basically equivalent (if not better) than the ChatGPT you were paying for 16 months ago.

    • refulgentis9 days ago
      If you're interested in it as a tool, you can skip this stuff.

      This is more for those curious about if AI is tulip bulbs.

      There's unintentional ideological camps on AI, one is mad about a financial/interest bubble*, and it is maintained by content like this every 4 to 8 weeks. (randomly selected representative comment, in this same discussion: https://news.ycombinator.com/item?id=43618256)

      * reminiscent, to me, of how I felt about Uber for years and years and years until I sort of moved on when it survived COVID.

    • apparent9 days ago
      If people think that LLMs will get very good at tons of things, they will invest quite a bit of time figuring out how to work with them (even if they are not that great at useful things right now). If those people then learn that LLMs will never get very good at so many things, they will then tend to invest less time in studying up on how to best use them.

      I know I write off some of the time I spend working with LLMs as an investment in the future. If someone told me this is as good as they'll get, I would definitely invest less time working with them.

  • clauderoux9 days ago
    As I said many times, I have been in the game for 30 years. I started doing AI with rules as early as the beginning of the 90 and I never...never expected to see anything like LLMs in my lifetime. When I read Marcus once again saying that: yes this time LLM have reached their limit, which he has been saying for 2 years in a row, I'm really feeling tired of his tune. The idea that LLM are a dead end, a failing technology is pretty weird. Compared to what??? I use LLM everyday in my work, to write summaries, to make translations, to generate some code or to get explanations about a given code... And I even use them as research sparring partners to see how I could improve my work... Gary Marcus has been involved in the domain for 30 years as well... Where is his technology that would match or surpass the LLM???
  • mdonaj9 days ago
    One of the comments in the article says: "I don't see how it's not a net negative tech," to which Marcus replies: "That’s my current tentative conclusion, yes."

    What is the negative effect I'm not seeing? Bad code? Economic waste in datacenter investment? Wasted effort of researchers who could be solving other problems?

    I've been writing software for over a decade, and I’ve never been as productive as I am now. Jumping into a new codebase - even in unfamiliar areas like a React frontend - is so much easier. I’m routinely contributing to frontend projects, which I never did before.

    There is some discipline required to avoid the temptation to just push AI-generated code, but otherwise, it works like magic.

    • efitz9 days ago
      I’ve been playing a lot with Claude Code recently and also making my first significant foray into front end development, and I think that LLMs are the tech that have finally made front end development broadly accessible.
    • fouc9 days ago
      I assumed it was a reference to the nonsensical memes of the AI compute resources causing damage to the environment, using up water and electricity etc.
  • jmweast9 days ago
    Really just a matter of when the bubble pops now, isn't it? There's just too much substantial evidence pointing to the fact that AI is simply not going to be the product the big players say it will be.
    • bitwize9 days ago
      Yeah, no shit. Winter is coming again for AI:

      https://en.wikipedia.org/wiki/AI_winter

      That said, the techniques that have come out of machine-learning research in recent years are indeed powerful, for certain, constrained purposes. That was true of other AI technologies in the past, but we don't call them AI anymore; we call them things like "rules engines". These days, you could take a curren-year press release about AI and use a cloud-to-butt style filter to replace all occurrences of "AI" with "statistics" and see virtually zero drop in factuality (thought a considerable drop in market sizzle).

  • fouc9 days ago
    GPT-1 to GPT-2: June 2018 to February 2019 = 8 months.

    GPT-2 to GPT-3: February 2019 to June 2020 = 16 months.

    GPT-3 to GPT-4: June 2020 to March 2023 = 33 months.

    Looks like time to get to the next level is doubling. So we can expect GPT-5 sometime June 2028.

    Feels like people are being premature about claiming AI winter or that it is somehow a scandal that we don't already have GPT-5.

    It's going to take time. We need some more patience in this industry.

    • crmi9 days ago
      This is overlooking the massive capital that's been invested in the part of the 33 months... Which alters the doubling timeline significantly.
    • kartoffelmos9 days ago
      Considering the capital burn rate here, I suspect that waiting years between iterations will be a hard pill to swallow for investors.
  • ninetyninenine9 days ago
    The technology is just a couple years old, and this article is derived from a couple months of evidence.

    We can't yet say what the future holds. The nay Sayers who were so confident that LLMs were stochastic parrots are now embarrassingly wrong. This article sounds like that. Whether we are actually at a dead end or not is unknown. Why are people talking with such utter conviction when nobody truly understands what's going internally with LLMs?

    • frizlab9 days ago
      According to AI, the technology dates back to 1763.
      • ninetyninenine8 days ago
        The technology that allows AI to lie and hallucinate is only a couple years old at best.
        • frizlab6 days ago
          AI does not lie. AI does not hallucinate. AI does not do anything. It’s just an algorithm. We, as humans, tend to anthropomorphize it, but it has no will of its own.
          • ninetyninenine3 days ago
            If you haven’t realized that the human brain is just an algorithm too then you don’t know what you’re talking about.

            The meaning of the word lie and hallucinate can be assigned to other algorithms outside of the algorithm that powers human brain. If something says something untrue or fabricates details then it is by definition lying or hallucinating.

            It has nothing to do with anthropomorphization.

            Additionally don’t think of what the human brain does as some higher level thing. We are algorithms as much as the LLM is an algorithm.

  • coolThingsFirst9 days ago
    I know it, it was a scary period for programmers. The tide is turning. Meatsuits are back in the game.
  • bigyabai9 days ago
    > The reality, reported or otherwise, is that large language models are no longer living up to expectations, and its purveyors appear to be making dodgy choices to keep that fact from becoming obvious.

    What else is new?