Not sure if anyone who works in the foundational model space and who doesn't directly depend on LLMs 'making it' for VC money would claim differently. It is rather obvious at this point, but some companies are too far in and not cash rich enough so they have to keep the LLM dream alive.
This is the problem. The vast majority of people over-hyping LLMs don't even have the most basic understanding of how simple LLMs are at core (manifold-fitting the semantic space of the internet), and so can't understand why they are necessarily dead ends, theoretically. This really isn't debatable for anyone with a full understanding of the training + basic dynamics of what these models do.
But, practically, it remains to be seen where the dead end with LLMs lies. I think we are clearly approaching plateaus in both academic research and in practice (people forget or are unaware how much benchmarks are being gamed as well), but, even small practical gains remain game-changers in this space, and much of the progress / tradeoffs we actually care about can't be measured accurately yet (e.g. rapid development vs. "technical debt" from fast but not-understood / weakly-reviewed LLM code).
LLMs are IMO undebatably a theoretical dead end, and for that reason, a practical dead end too. But we haven't hit that practical dead end yet.
In human terms LLMs seem similar to talking without thinking but we can also think as a separate activity to waffling on.
In AI research terms, DeepMind have done some interesting things with Mind Evolution and AlphaEvolve, the latter being the one that came up with a more efficient matrix multiplication algorithm.
(https://deepmind.google/research/publications/122391/
https://deepmind.google/blog/alphaevolve-a-gemini-powered-co...)
If I had to steelman a counterargument, I'd handwave about RL and environments creating something greater-than the semantic space of the internet, and then highlight the part you mention where we haven't reached a practical dead-end. Maybe link out to the Anthropic interp work on them planning-in-advance via poking at activations when working on a rhyming poem.
A good example would be trying to make an LLM trained on the entire internet do math proofs. Almost everything in its dataset tells it that the word "orthogonal" means "unrelated to", because this is how it is used colloquially. Only in a tiny amount of math forums / resources it digested does this actually mean something about the dot product, so clearly an LLM that does math well only does so by ignoring the majority of the space it is trained on. Similar considerations apply for attempting to use e.g. vision-language models trained on "pop" images to facilitate the analysis of, say, MRI scans, or LIDAR data. That we can make some progress in these domains tells us there is some substantial overlap in the semantics, but it is obvious there are limits to this.
There is no reason to believe these (often: irrelevant, incorrect) semantics learned from the entire web are going to be helpful for the LLM to produce deeply useful math / MRI analysis / LIDAR interpretation. Broadly, not all semantics useful in one domain are useful in another, and, even more clearly, linguistic semantics clearly have limited relevance to much of what we consider intelligence (which includes visual, auditory, proprioceptive/kinaesthetic, and, arguably, mathematical abstractions). But, it could well be that curve-fitting huge amounts of data from the relevant semantic space (e.g. feeding transformers enough Lean / MRI / LIDAR data) is in fact all we need, so that e.g. transformers are "good enough" for achieving most basic AI aims. It just is clearly the case that the internet can't provide all that data for all / most domains.
EDIT: Also Anthropic's writeups are basically fraud if you actually understand the math, there is no "thinking ahead" or "planning in advance" in any sense, literally just if you head down certain paths due to pre-training, yes, of course, you can "already see" weight activations of future tokens: this is just what curve-fitting in N-D looks like, there is no where else for the model to go. Actual thinking ahead means things like backtracking / backspace tokens, i.e. actually retracing your path, which current LLMs simply cannot do.
There are probably good reasons why LLMs are not the "ultimate solution", but this argument seems wrong. Humans have to ignore the majority of their "training dataset" in tons of situations, and we seem to do it just fine.
LLMs could be a dead end, but aren't anywhere close to saturating the technology yet.
You have a ad-hominem attack and your own personal anecdote with, which are not an argument for LLMs.
The idea that LLMs will reach AGI is entirely speculative, not least because AGI is undefined and speculative.
Something is fundamentally missing with LLMs w.r.t. intelligence per watt. assuming gpt4 is around human intelligence, that needs 2-4 H100s, so roughly the same and that doesn't include the rest of the computer.
That being said, we're willing to brute force our way to a solution to some extent so maybe it doesn't matter, but I say the fact that we don't use that much energy is proof enough we haven't perfected the architecture yet.
The median monthly salary in Bangladesh is cheaper than the Cursor Ultra plan. And Cursor loses money.
An experienced developer in India makes around $20k.
I've been following Yann for years and in my opinion he's been consistently right. He's been saying something like this for a long time while Elon Musk and others breathlessly broadcast that scaling up would soon get us to AGI and beyond. Mark Zuckerberg bought in to Musk's idea. We'll see, but it's increasingly looking like LeCunn is right.
Also almost everyone agrees the current architecture and paradigm, where you have a finite context (or a badly compressed one in Mamba / SSM), is not sufficient. That plus lots of other issues. That said scaling has delivered a LOT and it’s hard to argue against demonstrated progress.
> ...but you’d better not have him at the helm of something you expect to turn a profit on
I don't understand this distinction. Is anyone (besides NVDA) turning a profit on inference at this point?No? LLMs are getting smarter and smarter, only three years have passed since ChatGPT was released and we have models generating whole apps, competently working on complex features, solving math problems at a level only reached by a small percentage of the population, and much more. The progress is constant and the results are stunning. Really it makes me wonder in what sort of denial are those who think this has been proven to be a dead end.
Right now the definition of AGI has been hijacked so much that it can mean absolutely anything.
A prime environment for snake oil salesmen like Altman and Musk.
No one has even given a rigorous definition of the I, much less the G qualifier.
Your argument says we should have flying cars by now, because they kept on getting better.
LeCun says LLMs do text processing so won't scale to AGI, just like a faster can never fly (controllably).
Given that we have seen research from DeepSeek and Google on optimizing parts of the lower layers of deep neural networks, it's clear that a new form of AI needs to be created and I agree that LeCun will be proven right.
Instead of borrowing tens of trillions to scale to a false "AGI".
This is an absolutely crazy statement vis-a-vis reality and the fact that it’s so upvoted is an indictment of the type of wishful thinking that has grown deep roots here.
It is very clear when you look at academic papers actually targeting problems specific to reasoning / intelligence (e.g. rotation invariance in images, adversarial robustness) that all the big companies are doing is just fitting more data / spending more resources on human raters and other things to boost performance on (open) metrics, but that clear actual gains in genuine intelligence are being made only by milking what we know very well to be a limited approach. I.e. there are trivially-basic problems that cannot be solved by curve-fitting models, which makes it clear most current advances are indeed coming from curve(manifold) fitting. It just isn't clear how far we can exploit these current approaches and in what domains this kind of exploitation is more than good enough.
EDIT: Are people unaware Google Scholar is a thing? It is trivial to find modern AI papers that can be read without requiring access to a research institution. And e.g. HuggingFace collects trending papers (https://huggingface.co/papers/trending), and etc.
But the bizarre thing is, even though the productivity of SWE's is increasing I dont believe there will be much happening in regards to lay offs due to the fact that there isn't complete trust in LLMs; I dont see this changing either. In which case the LLM producers will need to figure out a way to increase the value of LLMs and get users to pay more.
And, again, this is ignoring all the technical debt of produced code that is poorly understood, weakly-reviewed, and of questionable quality overall.
I still think this all has serious potential for net benefit, and does now in certain cases. But we need to be clearer about spelling out where that is (webshit, boilerplate, language-to-language translation, etc) and where it maybe isn't (research code, legacy code, large codebases, niche/expert domains).
I am hopeful about LLMs for SWE, but the progress is currently contextual.
Even if LLMs could write great code with no human oversight, the world would not change over night. Human creativity is necessary to figure out what stuff to produce that will yield incremental benefits to what already exists.
The humans who possess such capability stand to win long-term; said humans tend to be those from the humanities and liberal arts.
Lol. This is the complete opposite of reality. You realize lecun is memed for all his failed assertions of what LLMs cannot do? Look it up. You clearly have not been following closely, at all.
He has zero epistemic humility.
We don't know the nature of intelligence. His difficulties in scaling up his research is a testament to this fact. This means we really have no theoretical basis upon which to rest the claim that superintelligence cannot in principle emerge from LLM adjacent architectures--how can we make such a statement, when we don't even know what such thing looks like?
We could be staring at an imperative definition of superintelligence and not know it, nevermind that approximations to such a function could in principle be learned by LLMs (universal approximation theorem). It sounds exceedingly unlikely, but would you rather be comforted by false confidence or be told the honest truth of what our current understanding of the sciences can tell us?
Karpathy is probably the most careful not to write off LLMs entirely but he seems pretty skeptical.