41 pointsby _____ka day ago8 comments
  • anonzzzies21 hours ago
    > LLMs basically are a dead end

    Not sure if anyone who works in the foundational model space and who doesn't directly depend on LLMs 'making it' for VC money would claim differently. It is rather obvious at this point, but some companies are too far in and not cash rich enough so they have to keep the LLM dream alive.

    • D-Machine19 hours ago
      > Not sure if anyone who works in the foundational model space and who doesn't directly depend on LLMs 'making it' for VC money would claim differently

      This is the problem. The vast majority of people over-hyping LLMs don't even have the most basic understanding of how simple LLMs are at core (manifold-fitting the semantic space of the internet), and so can't understand why they are necessarily dead ends, theoretically. This really isn't debatable for anyone with a full understanding of the training + basic dynamics of what these models do.

      But, practically, it remains to be seen where the dead end with LLMs lies. I think we are clearly approaching plateaus in both academic research and in practice (people forget or are unaware how much benchmarks are being gamed as well), but, even small practical gains remain game-changers in this space, and much of the progress / tradeoffs we actually care about can't be measured accurately yet (e.g. rapid development vs. "technical debt" from fast but not-understood / weakly-reviewed LLM code).

      LLMs are IMO undebatably a theoretical dead end, and for that reason, a practical dead end too. But we haven't hit that practical dead end yet.

      • tim3333 hours ago
        I'm not sure about the dead end thing because you may be able to add on to them?

        In human terms LLMs seem similar to talking without thinking but we can also think as a separate activity to waffling on.

        In AI research terms, DeepMind have done some interesting things with Mind Evolution and AlphaEvolve, the latter being the one that came up with a more efficient matrix multiplication algorithm.

        (https://deepmind.google/research/publications/122391/

        https://deepmind.google/blog/alphaevolve-a-gemini-powered-co...)

      • refulgentis18 hours ago
        Why are LLMs a theoretical dead-end? I understand the "manifold-fitting the semantic space of the internet", but I don't understand "why they are necessarily dead ends, theoretically."

        If I had to steelman a counterargument, I'd handwave about RL and environments creating something greater-than the semantic space of the internet, and then highlight the part you mention where we haven't reached a practical dead-end. Maybe link out to the Anthropic interp work on them planning-in-advance via poking at activations when working on a rhyming poem.

        • D-Machine18 hours ago
          I should clarify that LLMs trained on the internet are necessarily a dead end, theoretically, because the internet both (1) lacks specialist knowledge and knowledge that cannot be encoded in text / language, and (2) is polluted with not just false, but irrelevant knowledge for general tasks. LLMs (or rather, transformers and deep models tuned by gradient descent) trained on synthetic data or more curated / highly-specific data where there are actual costs / losses we can properly model (e.g. AlphaFold) could still have tremendous potential. But "LLM" in the usual, everyday sense in which people use this label, are very limited.

          A good example would be trying to make an LLM trained on the entire internet do math proofs. Almost everything in its dataset tells it that the word "orthogonal" means "unrelated to", because this is how it is used colloquially. Only in a tiny amount of math forums / resources it digested does this actually mean something about the dot product, so clearly an LLM that does math well only does so by ignoring the majority of the space it is trained on. Similar considerations apply for attempting to use e.g. vision-language models trained on "pop" images to facilitate the analysis of, say, MRI scans, or LIDAR data. That we can make some progress in these domains tells us there is some substantial overlap in the semantics, but it is obvious there are limits to this.

          There is no reason to believe these (often: irrelevant, incorrect) semantics learned from the entire web are going to be helpful for the LLM to produce deeply useful math / MRI analysis / LIDAR interpretation. Broadly, not all semantics useful in one domain are useful in another, and, even more clearly, linguistic semantics clearly have limited relevance to much of what we consider intelligence (which includes visual, auditory, proprioceptive/kinaesthetic, and, arguably, mathematical abstractions). But, it could well be that curve-fitting huge amounts of data from the relevant semantic space (e.g. feeding transformers enough Lean / MRI / LIDAR data) is in fact all we need, so that e.g. transformers are "good enough" for achieving most basic AI aims. It just is clearly the case that the internet can't provide all that data for all / most domains.

          EDIT: Also Anthropic's writeups are basically fraud if you actually understand the math, there is no "thinking ahead" or "planning in advance" in any sense, literally just if you head down certain paths due to pre-training, yes, of course, you can "already see" weight activations of future tokens: this is just what curve-fitting in N-D looks like, there is no where else for the model to go. Actual thinking ahead means things like backtracking / backspace tokens, i.e. actually retracing your path, which current LLMs simply cannot do.

          • grogers18 hours ago
            > so clearly an LLM that does math well only does so by ignoring the majority of the space it is trained on

            There are probably good reasons why LLMs are not the "ultimate solution", but this argument seems wrong. Humans have to ignore the majority of their "training dataset" in tons of situations, and we seem to do it just fine.

            • D-Machine17 hours ago
              It isn't wrong, just think about how weights are updated via (mini-)batches, and how tokenization works, and you will understand that LLM's can't ignore poisoning / outliers like humans do. This would be a classic recent example (https://arxiv.org/abs/2510.07192): IMO because the standard (non-robust) loss functions allow for anchor points .
    • aspenmartin19 hours ago
      I agree but I think to be fair it seems that there’s an open question of just how much more we can get from scaling / tricks. I would assume that there’s agreement that e.g. continual learning just won’t be solved without a radical departure from the current stack. But even with all of the baggage we have right now, if you believe extrapolations we have ~2 GPT4->5 sized leaps before everyone has to get out of the pool
  • fooker20 hours ago
    Being early is often the same as being wrong.

    LLMs could be a dead end, but aren't anywhere close to saturating the technology yet.

  • songodongo20 hours ago
    Well, he sure is confident in himself with quotes like “you certainly don’t tell a researcher like me what to do” and “I’m a visionary”. Best of luck.
    • D-Machine20 hours ago
      He's right that LLMs are a dead end, but yeah, those quotes were cringe as hell. Hubris.
  • alyxya18 hours ago
    I don't get the anti-LLM sentiment because plenty of trends continue to show steady progress with LLMs over time. Sure, you can poke at some dumb things LLMs do as evidence of some fundamental issue, but the frontier capabilities continue to amaze people. I suspect the anti-LLM sentiment comes from people who haven't given a serious chance at seeing all the things they're capable of for themselves. I used to be skeptical, but I've changed my mind quite a bit over the past year, and there are many others who've changed their stance towards LLMs as well.
    • D-Machine16 hours ago
      Or, people who've actually trained and used models in domains where "stuff on the internet" is of no relevance to what you are actually doing realize the profound limitations to what these LLMs actually do. They are amazing, don't get me wrong, but not so amazing in many specific contexts.
    • nutjob218 hours ago
      People who think that "steady progress" will continue forever have no basis for their assumption.

      You have a ad-hominem attack and your own personal anecdote with, which are not an argument for LLMs.

      • alyxya18 hours ago
        It'll steadily continue the same way Moore's law has continued for a while. I don't think people question the general trend in Moore's law besides the point where it's nearing the limit of physics. It's a lot harder to claim LLMs don't work as a universal claim, whereas claiming something is possible for LLMs only needs some evidence.
        • nutjob213 hours ago
          Yes, LLMs will continue to progress until they hit the limits of LLMs.

          The idea that LLMs will reach AGI is entirely speculative, not least because AGI is undefined and speculative.

      • stevenhuang15 hours ago
        Lecun has already been proven wrong countless times over the years regarding his predictions of what LLMs can or cannot do. While LLMs continue to improve, he has yet to produce anything of practical value from his research. The salt is palpable, and for this he's memed for a reason.
  • websiteapi20 hours ago
    human beings are estimated to use roughly 50 to 100W when idle (up to maybe 1000-2000W when exerting ourselves physically), and I think it's fair to say we're generally intelligent.

    Something is fundamentally missing with LLMs w.r.t. intelligence per watt. assuming gpt4 is around human intelligence, that needs 2-4 H100s, so roughly the same and that doesn't include the rest of the computer.

    That being said, we're willing to brute force our way to a solution to some extent so maybe it doesn't matter, but I say the fact that we don't use that much energy is proof enough we haven't perfected the architecture yet.

    • LogicFailsMe19 hours ago
      At 5 cents or less per kWH these days, 10 kW is 50 cents per hour, well below minimum wage. LLMs aren't AGI and I'm not convinced we're anywhere close to AGI, but they are useful. That the people deploying them have the same product instincts as Microsoft executives seems to be the core issue.
    • boroboro417 hours ago
      This being said in this setup of 2-4 h100 you’ll be able to generate with batch size of somewhere around 128 ie its 128 humans and not one. And just like that difference in efficiency isn’t that high anymore.
    • nutjob218 hours ago
      Brains use approximately 20W.
    • dangus19 hours ago
      I think the more important point to bring up is “you can hire a human for minimum wage.”

      The median monthly salary in Bangladesh is cheaper than the Cursor Ultra plan. And Cursor loses money.

      An experienced developer in India makes around $20k.

    • leoh20 hours ago
      Not for inference, right?
      • websiteapi20 hours ago
        correct - h100 can do like 100 tokens per second on a gpt4 like model, but you'd need to account for regular fine-tuning to accurately compare to a person, hence 4 or so. of course the whole comparison is inane since computers and humans are obviously so different ha...
  • labradora day ago
    "I'm sure there's a lot of people at Meta, including perhaps Alex, who would like me to not tell the world that LLMs basically are a dead end when it comes to superintelligence" - Yann LeCun

    I've been following Yann for years and in my opinion he's been consistently right. He's been saying something like this for a long time while Elon Musk and others breathlessly broadcast that scaling up would soon get us to AGI and beyond. Mark Zuckerberg bought in to Musk's idea. We'll see, but it's increasingly looking like LeCunn is right.

    • aspenmartina day ago
      More like Yann had a long time to prove out his ideas and he did not deliver, meanwhile the industry passed Meta/Facebook by due to the sort of product-averse comfortable academic bubble that FAIR lived in. It wasn’t Zuckerberg getting swindled it was giving up on ever seeing Yann deliver anything other than LinkedIn posts and small scale tests. You do not want to bank on Yann for a big payoff. His ideas may or may not be right (joint predictive architectures, world modeling, etc), but you’d better not have him at the helm of something you expect to turn a profit on.

      Also almost everyone agrees the current architecture and paradigm, where you have a finite context (or a badly compressed one in Mamba / SSM), is not sufficient. That plus lots of other issues. That said scaling has delivered a LOT and it’s hard to argue against demonstrated progress.

      • labrador21 hours ago
        As I said in my cousin comment, it depends on how you define AGI and ASI. Claude Opus 4.5 tells me "[Yann LeCun] thinks the phrase AGI should be retired and replaced by "human-level AI." which supports my cousin comment
      • CharlieDigital20 hours ago

            > ...but you’d better not have him at the helm of something you expect to turn a profit on
        
        I don't understand this distinction. Is anyone (besides NVDA) turning a profit on inference at this point?
        • aspenmartin19 hours ago
          I don’t know I assume not but everyone has a product that could easily be profitable, it would just be dumb to do it because you will lose out to everyone else running at a loss to capture market share. I just mean the guy seems to have an aversion to business sensibility generally. I think he’s really in it for the love of the research. He’s of course rightly lauded for everything he’s done, he’s extremely brilliant, and in person (at a distance) very kind and reasonable (something that is very different than his LinkedIn personality which is basically a daily pissing contest). But I would not give him one cent of investment personally.
    • throw310822a day ago
      > He's been saying something like this for a long time [...] it's increasingly looking like LeCunn is right.

      No? LLMs are getting smarter and smarter, only three years have passed since ChatGPT was released and we have models generating whole apps, competently working on complex features, solving math problems at a level only reached by a small percentage of the population, and much more. The progress is constant and the results are stunning. Really it makes me wonder in what sort of denial are those who think this has been proven to be a dead end.

      • labrador21 hours ago
        If you call that AGI as many do or ASI, then we are not talking about the same thing. I'm talking about conversing with AI and being unable to tell if it's human or not in kind of a Turing Plus test. Turing Plus 9 would be 90% of humans can't tell if it's human or not. We're at Turing Plus 1. I can easily tell Claude Opus 4..5 is a machine by the mistakes it made. It's dumb as a box of rocks. That's how I define AGI and beyond to ASI
        • rvz21 hours ago
          This goes for any experienced senior SWE individual with a sharp attention to detail can easily tell if an AI wrote a project or not.

          Right now the definition of AGI has been hijacked so much that it can mean absolutely anything.

          • nutjob218 hours ago
            No one has even given a rigorous definition of AGI.

            A prime environment for snake oil salesmen like Altman and Musk.

            • dragonwriter16 hours ago
              > No one has even given a rigorous definition of AGI.

              No one has even given a rigorous definition of the I, much less the G qualifier.

      • nutjob218 hours ago
        Nothing you say proves or indicates that progress will continue indefinitely.

        Your argument says we should have flying cars by now, because they kept on getting better.

        LeCun says LLMs do text processing so won't scale to AGI, just like a faster can never fly (controllably).

        • throw31082212 hours ago
          When I look at these cars, I don't see them going faster, I see them hovering higher and higher above the ground. They're already flying.
    • skybrian21 hours ago
      It's too soon to say anything like that is proven. Sure, AGI hasn't been reached yet. I suspect there's some new trick that's needed. But the work going into LLM's might be part of the eventual solution.
    • rvz21 hours ago
      We are due for much more optimizations and new deep learning architectures rather than throwing more compute + RAM + money + GPUs + data at the problem, which you can do only for so long until a bottleneck occurs.

      Given that we have seen research from DeepSeek and Google on optimizing parts of the lower layers of deep neural networks, it's clear that a new form of AI needs to be created and I agree that LeCun will be proven right.

      Instead of borrowing tens of trillions to scale to a false "AGI".

      • labrador21 hours ago
        This seems so obvious to me that scaling advocates seem to be exhibiting a form of magical thinking
        • djmips17 hours ago
          I can see why it's so intoxicating though, it seemed magical that scaling got us as far as it did.
    • catigula21 hours ago
      > but it's increasingly looking like LeCunn is right.

      This is an absolutely crazy statement vis-a-vis reality and the fact that it’s so upvoted is an indictment of the type of wishful thinking that has grown deep roots here.

      • D-Machine19 hours ago
        If you are paying attention to actual research, guarded benchmarks, and understand how benchmarks are being gamed, I would say there is plenty of evidence we are approaching a clear plateau / the march-of-nines thesis of Karpathy is basically correct long-term. Short-term it remains to be seen how much more we can do with the current tech.
        • gbnwl19 hours ago
          Can you point me to some of the actual research you're talking about? I'd love to read.
          • D-Machine17 hours ago
            Your best bet would be to look deeply into performance on ARC-AGI fully-private test set performances (e.g. https://arcprize.org/blog/arc-prize-2025-results-analysis), and think carefully about the discrepancies here, or, just to broadly read any academic research on classic benchmarks and note the plateaus on classic datasets.

            It is very clear when you look at academic papers actually targeting problems specific to reasoning / intelligence (e.g. rotation invariance in images, adversarial robustness) that all the big companies are doing is just fitting more data / spending more resources on human raters and other things to boost performance on (open) metrics, but that clear actual gains in genuine intelligence are being made only by milking what we know very well to be a limited approach. I.e. there are trivially-basic problems that cannot be solved by curve-fitting models, which makes it clear most current advances are indeed coming from curve(manifold) fitting. It just isn't clear how far we can exploit these current approaches and in what domains this kind of exploitation is more than good enough.

            EDIT: Are people unaware Google Scholar is a thing? It is trivial to find modern AI papers that can be read without requiring access to a research institution. And e.g. HuggingFace collects trending papers (https://huggingface.co/papers/trending), and etc.

            • jk244417 hours ago
              At present its only SWE's that are benefitting from a productivity stand point. I know a lot of people in finance (from accounting to portfolio management) and they scoff at the outputs of LLMs in their day to day jobs.

              But the bizarre thing is, even though the productivity of SWE's is increasing I dont believe there will be much happening in regards to lay offs due to the fact that there isn't complete trust in LLMs; I dont see this changing either. In which case the LLM producers will need to figure out a way to increase the value of LLMs and get users to pay more.

              • Ianjit16 hours ago
                Are SWE’s really experiencing a productivity uplift? When studies attempt to measure the productivity impact of AI in software the results I have seen are underwhelming compared to the frontier labs marketing.
                • D-Machine16 hours ago
                  This too should be questioned, at least a couple studies at this point suggesting many feel like they are going faster with AI when, by some metrics, they are going slower (e.g. https://metr.org/blog/2025-07-10-early-2025-ai-experienced-o...), and then there are e.g. admissions from major CEOs publicly admitting e.g. Copilot doesn't "really work" (https://ppc.land/microsoft-ceo-admits-copilot-integrations-d...).

                  And, again, this is ignoring all the technical debt of produced code that is poorly understood, weakly-reviewed, and of questionable quality overall.

                  I still think this all has serious potential for net benefit, and does now in certain cases. But we need to be clearer about spelling out where that is (webshit, boilerplate, language-to-language translation, etc) and where it maybe isn't (research code, legacy code, large codebases, niche/expert domains).

              • D-Machine17 hours ago
                Yup, most progress is also confined to SWE's doing webshit / writing boilerplate code too. Anything specialized, LLMs are rarely useful, and this is all ignoring the future technical debt of debugging LLM code.

                I am hopeful about LLMs for SWE, but the progress is currently contextual.

                • jk244417 hours ago
                  Agreed.

                  Even if LLMs could write great code with no human oversight, the world would not change over night. Human creativity is necessary to figure out what stuff to produce that will yield incremental benefits to what already exists.

                  The humans who possess such capability stand to win long-term; said humans tend to be those from the humanities and liberal arts.

        • catigula13 hours ago
          You're going to be eating so much crow shortly.
    • stevenhuang15 hours ago
      > I've been following Yann for years and in my opinion he's been consistently right

      Lol. This is the complete opposite of reality. You realize lecun is memed for all his failed assertions of what LLMs cannot do? Look it up. You clearly have not been following closely, at all.

      • labrador13 hours ago
        I meant he's held the line against those warning us about superintelligent AI soon destroying us all
        • stevenhuang13 hours ago
          Sure and that is fair. Seldom are extreme viewpoints likely scenarios anyways, but my disagreement with him stems from his unwarranted confidence in his own abilities to predict the future when he's already wrong about LLMs.

          He has zero epistemic humility.

          We don't know the nature of intelligence. His difficulties in scaling up his research is a testament to this fact. This means we really have no theoretical basis upon which to rest the claim that superintelligence cannot in principle emerge from LLM adjacent architectures--how can we make such a statement, when we don't even know what such thing looks like?

          We could be staring at an imperative definition of superintelligence and not know it, nevermind that approximations to such a function could in principle be learned by LLMs (universal approximation theorem). It sounds exceedingly unlikely, but would you rather be comforted by false confidence or be told the honest truth of what our current understanding of the sciences can tell us?

  • djmips17 hours ago
    LeCun is agist - he said a 65 year old man is too old to be a CEO.
  • htrp18 hours ago
    Reminder that it's in the lecun's interests to talk up AMI and to explain why they're going to win when they didn't do so at FAIR.
    • mbac3276816 hours ago
      Yann joins Ilya, Karpathy, Sutton + Carmack when he says LLMs are a dead end, though.

      Karpathy is probably the most careful not to write off LLMs entirely but he seems pretty skeptical.