Watch the original Sutskever interview: https://www.youtube.com/watch?v=aR20FWCCjAs
And LeCun: https://www.youtube.com/watch?v=4__gg83s_Do
That said, I use AI summaries for a lot of stuff that I don't really care about. For me, this topic is important enough to spend two hours of my life on, soaking up every detail.
As for being on par with typical surface level journalism. I think we might be further into the dead internet than most people realize: https://en.wikipedia.org/wiki/Dead_Internet_theory
This is basically all of it.
Kind of how word processors solved the writing is tedious struggle and search solved the "can't curate the internet" struggle.
And from my experience there are lots and lots of jobs that are just "clicking the right buttons".
Suppose training is so efficient that you can train state of the art AGI on a few GPUs. If it's better than current LLMs, there will be more demand/inference, which will require more GPUs and we are back at the same "add more gpus".
I find it hard to believe that we, as a humanity, will hit the wall of "we don't need more compute", no matter what the algorithms are.
Ilya did also acknowledge that these houses will still generate gobs of revenue, despite being at a dead end, so I'm not sure what the criticism is, exactly.
Everyone knows another breakthrough is required for agi to arrive; sama explicitly said this. Do you wait and sit on your hands until that breakthrough arrives? Or make a lot of money while skating to where the puck will be?
But what we're seeing at the moment, is a deceleration, not an acceleration.
Maybe they lose relevance. Maybe they miss the breakthrough. That becomes the reason. So perplexity? Sure. Anthropic, even? Yep. Google? OpenAI? Nah.
Regardless, viewing the unit economics, there are very clear sight lines to profitability if they want it. Just like with Amazon, Tesla, Apple, etc., when you want to grow, hoarding cash is a bad play.
As for nvidia, if OpenAI has less leverage, that necessitates a different ai company having more. Who would it be?
Countless examples of companies that strive for profit too early, only to die
They still cost billions to pre-train
They cost billions to train right now because people are willing to throw billions away to get to the goal first. Given more time, cheaper and more clever training methods will be found.
Everyone at anthropic is saying ASI is imminent…
Who exactly is saying this, other than C-level people?
> Models look god-tier on paper:
> they pass exams
> solve benchmark coding tasks
> reach crazy scores on reasoning evals
Models don't look "god-tier" from benchmarks. Surely an 80% is not godlike. I would really like more human comparisons for these benchmarks to get a good idea of what an 80% means though.I would not say that any model shows a "crazy" score on ARC-AGI.
I broadly have seen incremental improvements in benchmarks since 2020, mostly at a level I would believe to be below average human reasoning, but above average human knowledge. No one would call GPT-3 godlike and it is quite similar to modern models in benchmarks; it is not a difference of like 1% vs 90%. I think most people would consider gpt-3 to be closer to opus 4.5 than opus 4.5 is to a human.
Though I do not fully know where the boundary between "a model prompted to iterate and use tools" and "a model trained to be more iterative by design" is. How meaningful is that distinction?
But the people who don't get this are the less-technical/less-hands-on VPs, CEOs, etc, who are deciding on layoffs, upcoming headcount, "replace our customer service or engineering staffs with AI" things. A lot of those moves are going to look either really silly or really genius depending on exactly how "AGI-like" the plateau turns out to be. And that affects a LOT of people's jobs/livelihood, so it's good to see the hype machine start to slow down and get more realistic about the near-term future.
Tooling vs model is a false dichotomy in this case. The massive improvements in tooling are directly traceable back to massive improvements in the models.
If you took the same tooling and scaffolding and stuck GPT-3 or even GPT-4 in it, they would fail miserably and from the outside the tooling would look abysmal, because all of the affordances of current tooling come directly from model capability.
All of the tooling approaches of modern systems were proposed and prototypes were made back in 2020 and 2021 with GPT-3. They just sucked because the models sucked.
The massive leap in tooling quality directly reflects a concomitant leap in model quality.
My stance hasn't changed, his has.
There's a big problem in that we reward those who hype, not merit. When the "era of scaling" happened there was a split. Those that claimed "Scale is all you need" and those that claimed "Scale is not enough". The former won, and I even seem to remember a bunch of people with T-shirts at NeurIPS with "scale is all you need" around that time.
So then, why are we again rewarding those same people when they change tunes? Their bet lost, sorry. I'm happy we tried scale and I'm glad we made progress, but at the same time many of us have been working outside the SIAYN paradigm and we struggled to get papers through review[0]. Scaling efforts led to lots of publications and citations, but you got far less by working outside that domain. And FFS, the reason most of you know Gary Marcus is because he was a vocal opposition to SIAYN and had enough initial clout. So as this tune is changing does the money shift towards us? Of course not.
I don't care about being vindicated, I care about trying to do research[1]. I don't care about the money, I care about trying to make AGI. Even Sutton has said that the Bitter Lesson was not about SIAYN!
So why I'm annoyed is that it seems we're going to let those who made big claims and fell short rather than those who correctly predicted the result. Why do we reward those who chase hype more than we reward those who got it right?
[0] a common criticism being "but does it work at scale?" or "needs more experiments". While these critiques/questions are legitimate they are out of place. Let us publish the small scale results first so that we can evidence our requests for more scale. Do you expect us to start at large scale first?
[1] I'm anonymous here, I don't care about the internet points. For the sake of this comment I might as well be any one of those Debbie Downers who pushed back against SIAYN and talked about the limits and how we shouldn't put all our eggs in one basket. There's thousands of us
I like Dario’s view on this, we’ve seen this story before with deep learning. Then we progressively got better regularization, initialization, and activations.
I’m sure this will follow the same suit, the graph of improvement is still linear up and to the right
> The industry is already operating at insane scale.
Sounds a lot like "640K ought to be enough for anybody", or "the market can stay irrational longer than you can stay solvent".
I don't doubt this person knows how things should go but I also don't doubt this will get bigger before it gets smaller.
Possibly... but also a lot of the foundational AI advancements were actually done in skunkworks-like environments and with pure research rather than iterating in front of the public.
It's not 100% clear to me if the ultimate path to the end is iteration or something completely new.
Social contagion is astonishingly potent around ideas like this one, and this probably explains why the zeitgeist seems to be immovable for a time and then suddenly change.
Its charlatans like sama that muddy the waters by promising the sky to get money for their empire building.
LLMs can make and are great great products. But its sneaky salesmen that are the ones saying scaling is the path to AGI. The reality is that they're just aiming for economies of scale to make their business viable
What high quality data sources are not already tapped?
Where does the next 1000x flops come from?
Stick a microphone and camera outside on a robot and you can get unlimited data of perfect quality (because it by definition is the real world, not synthetic). Maybe the "AGI needs to be embodied" people will be right, because that's the only way to get enough coherent multimodal data to do things like long-range planning, navigation, game-playing, and visual tasks.
Be careful with mistaking data for information.
You are getting a digital (maybe lossy compressed) samples of photons and sound waves. It is not unlimited, a camera pointed at a building at night is going to have very little new information from second to second. A microphone outside is going to have very little new information second to second unless something audible is happening close by.
You can max out your storage capacity by adding twenty ML high megapixel cameras recording frames as tiff file but gain little new useful information for every camera you add.
Some people don't seem to realize how critical the "eval" function is for machine learning.
Raw data is not much more useful than noise for the current recipes of model training.
Human produced data on the internet (text, images, etc.) is highly structured and the eval function can easily be built.
Chess or Go has rules and the eval function is more or less derived or discovered from them.
But the real world?
For driving you can more or less build a computer vision system able to follow a road in a week, because the eval function is so simple. But for all the complex parts, the eval function is basically one bit (you crashed/not crashed) that you have to sip very slowly, and it very inefficient to train such a complex system with such a minimal reward even in simulations.
I don't see how this is any less structured than the CLM objective of LLMs, there's a bunch of rich information there.
There is at least one missing piece to the puzzle, and some say 5-6 more breakthrough are necessary.
> Where does the next 1000x flops come from? Even with Moore's law dead, we can easily build 1,000x more computers. And for arguments about lack of power - we have sun.
Even still, we need evolutions in model architecture to get to the next level. Data is not enough.
LLMs can't do jack shit with ciphertext (sans key).
Another way is CPU+ + fast memory, like Apple does. It's limited but power efficient.
Looks like with ecosystem development we need the whole spectrum from big models+tools running on datacenters to smaller running locally, to even smaller on mobile devices and robots.
This does not mean he's not an accomplished and very talented researcher.
LeCun was sacked from Meta.
Not sure if it's wise to listen to their advice ...
Earlier:
Ilya Sutskever: We're moving from the age of scaling to the age of research
https://news.ycombinator.com/item?id=46048125
And one of the recent LeCun discussions:
But I don't recall him actually saying that the current ideas won't lead to AGI.
Then, he starts to talk about the other ideas but his lawyers / investors prevent him from going into detail: https://youtu.be/aR20FWCCjAs?t=1939
The worrisome thing is that he openly talks about whether to release AGI to the public. So, there could be a world in which some superpower has access to wildly different tech than the public.
To take Hinton's analogy of AGI to extraterrestrial intelligence, this would be akin to a government having made contact but withholding the discovery and the technology from the public: https://youtu.be/e1Hf-o1SzL4?t=30
It's a wild time to be alive.
Ilya has appeared to shift to closer to Yann's position, though: he's been on the "scaling LLMs will fail to reach AGI" beat for a long time.
Yeah, the actual video with transcripts (YouTube link in bottom of TFA):
https://www.dwarkesh.com/p/ilya-sutskever-2
Ed: TFA is basically a dupe of
every LLM easily misaligned, "deceived to deceive" and whatnot and they want to focus on adding MORE ATTACK SURFACE???
and throw more CPU at it?
This is glorious.
time to invest in the pen & paper industry!
While the tech is useful, the mass amounts of money being shoveled into AI has more to do with the ever escaping mirage of a promised land where there will be an infinite amount of 'more'. For some people that means post scarcity, for others it means a world dominating AGI that achieves escape velocity against the current gridlock of geopolitics, for still others it means ejecting the pesky labour class and replacing all labour needs with AI and robots. Varied needs, but all perceived as urgent and inescapable by their vested interests.
I am somewhat relieved that we're not headed into the singularity just yet, I see it as way too risky given the current balance of power and stability across the planet. The outcome of ever accelerating tech progress at the expense of all other dimensions of wellbeing is not good for the majority of life here.
When talking with non-tech people around me, it’s really not about “rational minds”, it’s that people really don’t understand how all this works and as such don’t see the limitations of it.
Combine that with a whole lot of FOMO which happens often with investors and you have a whole pile of money being invested.
From what I hear, most companies like Google and Meta have a lot of money to burn, and their official position towards investors is “chances of reaching ASI/AGI are very low, but if we do and we miss out on it, it will mean a huge opportunity loss so it’s worth the investment right now”.
What are the limits? We know the limits for naked LLMs. Less so for LLM + current tools. Even less for LLM + future tools. And can only guess about LLM + other models + future tools. I mean moving forward likely requires complexity, research and engineering. We don't know the limits of this approach even without any major breakthrough. Can't predict, but if breakthrough happens it all will be different, but better than (we can foresee) today.