One question I would have about the research direction is the emphasis on realtime. If I understand correctly he's doing online learning in realtime. Obviously makes for a cool demo and pulls on his optimisation background, and no doubt some great innovations will be required to make this work. But I guess the bitter lesson and recent history also tell us that some solutions may only emerge at compute levels beyond what is currently possible for realtime inference let alone learning. And the only example we have of entities solving Atari games is the human brain, of which we don't have a clear understanding of the compute capacity. In which case, why wouldn't it be better to focus purely on learning efficiency and relax the realtime requirement for now?
That's a genuine question by the way, definitely not an expert here and I'm sure there's a bunch of value to working within these constraints. I mean, jumping spiders solve reasonably complex problems with 100k neurons, so who knows.
Obviously both Carmack and the rest of the world has changed since then, but it seems to me his main strength has always been in doing more with less (early id/Oculus, AA). When he's working in bigger orgs and/or with more established tech his output seems to suffer, at least in my view (possibly in his as well since he quit both Bethesda-id and Meta).
I don't know Carmack and can't claim to be anywhere close to his level, but as someone also mainly interested in realtime stuff I can imagine he also feels a slight disdain for the throw-more-compute-at-it approach of the current AI boom. I'm certainly glad he's not running around asking for investor money to train an LLM.
Best case scenario he teams up with some people who complement his skillset (akin to the game designers and artists at id back in the day) and comes up with a way to help bring some of the cutting edge to the masses, like with 3D graphics.
What Carmack did was basically get a 3d game running on existing COMMODITY hardware. The 386 chip that most people used for their excel spreadsheets did not do floating point operations well, so Carmack figured out how to do everything using integers.
May 1992 -> Wolfenstein 3d releases December 1993 -> Doom releases December 1994 -> Sony Playstation launches in Japan June 1996 -> Quake releases
So Wolfenstein and Doom were actually not really 3d games, but rather 2.5 games (you can't have rooms below other rooms). The first 3d game here is actually Quake which also eventually also got hardware acceleration support.
Carmack was the master of doing the seeminly impossible on super constrained hardware on virtually impossible timelines. If DOOM released in 1994 or 1995, would we still remember it in the same way?
Maybe. One aspect of Wolfenstein and Doom's popularity is that it was years ahead of everyone else technically on PC hardware. The other aspect is that they were genre defining titles that set the standards for gameplay design. I think Doom Deathmatch would have caught on in 1995, as there really were very few (just Command and Conquer?) standout PC network multiplayer games released between 1993 and 1995.
The first 3d console games started to come out that year, like Rayman. Star Wars Dark Forces with its own custom 3d engine also came out. Of course Dark Forces was, however, an overt clone of DOOM.
It's a bit ironic, but I think the gameplay innovation of DOOM tends to hold up more than the actual technical innovation. Things like BSP for level partitioning have slowly been phased out of game engines, we have ample floating point compute power and hardware acceleration ow, but even developers of the more recent DOOM games have started to realize that they should return to the original formula of "blast zombies in the face at high speed, and keep plot as window dressing"
Ultima Underworld is a true 3D game from 1992. An incredibly impressive game, in more ways than one.
I think so, because the thing about DOOM is, it was an insanely good game. Yes, it pioneered fullscreen real-time perspective rendering on commodity hardware, instantly realigning the direction of much of the game industry, yadda yadda yadda, but at the end of the day it was a good-enough game for people to remember and respect even without considering the tech.
Minecraft would be a similar example. Minecraft looked like total ass, and games with similar rendering technology could have been (and were) made years earlier, but Minecraft was also good. And that was enough.
Carmack builds his kingdom and then runs it well.
I makes me wonder how he would fare as an unknown Jr. developer with managers telling him "that's a neat idea, but for now we just need you to implement these Figma designs".
Assuming one is willing to accept the risks and has the requisite high-talent plus strong work drive, the Carmack-like career pattern is to devote great care to evaluating and selecting opptys near the edges of newly emerging 'interesting things' which also: coincide with your interests/talents, are still at a point where a small team can plausibly generate meaningful traction, and have plausible potential to grow quickly and get big.
Carmack was fortunate that his strong interest in graphics and games overlapped a time period when Moore's Law was enabling quite capable CPU, RAM and GFX hardware to hit consumer prices. But we shouldn't dismiss Carmack's success as "luck". That kind of luck is an ever-present uncontrolled variable which must be factored into your approach - not ignored. Since Carmack has since shown he can get very interested in a variety of things, I assume he filtered his strong interests to pick the one with the most near-term growth potential which also matched his skills. I suspect the most fortunate "luck" Carmack had wasn't picking game graphics in the early 90s, it was that (for whatever reasons) he wasn't already employed in a more typical "well-paying job with a big, stable company, great benefits and career growth potential" so he was free to find the oppty in the first place.
I had a similarly unconventional career path which, fortunately, turned out very well for me (although not quite at Carmack's scale :-)). The best luck I had actually looked like 'bad luck' to me and everyone else. Due to my inability to succeed in a traditional educational context (and other personal shortcomings), I didn't have a college degree or resume sufficient to get a "good job", so I had little choice but to take the high-risk road and figure out the unconventional approach as best I could - which involved teaching myself, then hiring myself (because no one else would) and then repeatedly failing my way through learning startup entrepreneurship until I got good at it. I think the reality is that few who succeed on the 'unconventional approach' consciously chose that path at the beginning over lower risk, more comfortable alternatives - we simply never had those alternatives to 'bravely' reject in pursuit of our dreams :-).
"A reality check for people that think full embodied AGI is right around the corner is to ask your dancing humanoid robot to pick up a joystick and learn how to play an obscure video game."
I made a google form question for collecting AGI definitions cause I don't see anyone else doing it and I find it infinitely frustrating the range of definitions for this concept:
https://docs.google.com/forms/d/e/1FAIpQLScDF5_CMSjHZDDexHkc...
My concern is that people never get focused enough to care to define it - seems like the most likely case.
Researchers at Google have proposed a classification scheme with multiple levels of AGI. There are different opinions in the research community.
There is no point collecting definitions for AGI, it was not conceived as a description for something novel or provably existent. It is "Happy Meal marketing" but aimed for adults.
My masters thesis advisor Ben Goertzel popularized the term and has been hosting the AGI conference since 2008:
https://goertzel.org/agiri06/%5B1%5D%20Introduction_Nov15_PW...
I had lunch with Yoshua Bengio at AGI 2014 and it was most of the conversation that day
The term AGI is obviously used very loosely with little agreement to it's precise definition, but I think a lot of people take it to mean not only generality, but specifically human-level generality, and human-level ability to learn from experience and solve problems.
A large part of the problem with AGI being poorly defined is that intelligence itself is poorly defined. Even if we choose to define AGI as meaning human-level intelligence, what does THAT mean? I think there is a simple reductionist definition of intelligence (as the word is used to refer to human/animal intelligence), but ultimately the meaning of words are derived from their usage, and the word "intelligence" is used in 100 different ways ...
It's an ideal that some people believe in, and we're perpetually marching towards it
Can we just use Morris et al and move on with our lives?
Position: Levels of AGI for Operationalizing Progress on the Path to AGI: https://arxiv.org/html/2311.02462v4
There are generational policy and societal shifts that need to be addressed somewhere around true Competent AGI (50% of knowledge work tasks automatable). Just like climate change, we need a shared lexicon to refer to this continuum. You can argue for different values of X but the crucial point is if X% of knowledge work is automated within a decade, then there are obvious risks we need to think about.
So much of the discourse is stuck at “we will never get to X=99” when we could agree to disagree on that and move on to considering the x=25 case. Or predict our timelines for X and then actually be held accountable for our falsifiable predictions, instead of the current vide based discussions.
Because we evolved to get where we are, humans have all sorts of messy behaviours that aren't really compatible with a utopian society. Theft, violence, crime, greed - it's all completely unnecessary and yet most of us can't bring ourselves to solve these problems. And plenty are happy to live apathetically while billionaires become trillionaires...for what exactly? There's a whole industry of hyper-luxury goods now, because they make so much money even regular luxury is too cheap.
If we can produce AGI that exceeds the capabilities of our species, then my hope is that rather than the typical outcome of "they kill us all", that they will simply keep us in line. They will babysit us. They will force us all to get along, to ensure that we treat each other fairly.
As a parent teaches children to share by forcing them to break the cookie in half, perhaps AI will do the same for us.
ASI to humans would be like humans are to rats or ants.
It could stomp all over us to achieve whatever goals it chooses to accomplish.
Humans being cared for as pets would be a relatively benign outcome.
Funnily enough, I still think some of the most interesting semi-recent writing on utopia was done ~15 years ago by... Eliezer Yudkowsky. You might be interested in the article on "Amputation of Destiny."
Link: https://www.lesswrong.com/posts/K4aGvLnHvYgX9pZHS/the-fun-th...
If AGI is created it is most likely to be guided by someone like Altman or Musk, people whose interests couldn't be farther from what you describe. They want to make themselves gods and couldn't care less about random plebs.
If AGI is setting its own principles then I fail to see why it would care about us at all. Maybe we'll be amusing as pets but I expect a superhuman intelligence will treat us like we treat ants.
It would seem our own generalized intelligence is an emergent property of many, _many_ specialized processes
I wonder if AI is the same
You can say that about other animals, but about humans it is not so sure. No animal can be taught as general set of skills as a human can, they might have some better specialized skills but clearly there is something special that makes humans so much more versatile.
So it seems there was this simple little thing humans got that makes them general, while for example our very close relatives the monkeys are not.
Science is full of theories that are correct per our current knowledge and then subsequently disproven when research/methods/etc improves.
Humans aren't special, we are made from blood & bone, not magic. We will eventually build AGI if we keep at it. However unlike VCs with no real skills except having a lot of money™, I couldn't say whether this is gonna happen in 2 years or 2000.
From what I can tell most in AI are currently hoping LLMs reach that point quick just because the hype is not helping AI at all.
You can call this hype, maybe it is all hype until LLMs can work on 10M LOC codebases, but recognize that LLMs are a shift that is totally incomparable to any previous AI advancement.
It will help a single human do more in a white collar world.
Sounds like AI robbed you of an opportunity to spend some time with your Dad, to me
It takes five minutes to program the thermostat, then you can have a beer on the patio if that's your speed and catch up for a bit
Life is little moments, not always the big commitments like taking a day to go fishing
That's the point of automating all of ourselves out of work, right? So we have more time to enjoy spending time with the people we love?
So isn't it kind of sad if we wind up automating those moments out of our lives instead?
I'd definitely like to improve my skills, but to be realistic, most of the programmers are not top-notch.
I hate Adobe, I don’t like to give them credit for anything. But their audio enhance tool is actual sorcery. Every competitor isn’t even close. You can take garbage zoom audio and make it sound like it was borderline recorded in a treated room/studio. I’ve been in production for almost 15 years and it would take me half a day or more of tweaking a voice track with multiple tools that cost me hundreds of dollars to get it 50% as good as what they accomplish in a minute with the click of a button.
It's not about specialized vs generalized models - it's about how models are trained. The chess engine that beat Kasparov is a specialized model (it only plays chess), yet it's the bitter lesson's example for the smarter way to do AI.
Chess engines are better at chess than LLMs. It's not close. Perhaps eventually a superintelligence will surpass the engines, but that's far from assured.
Specialized AI are hardly obsolete and may never be. This hypothetical superintelligence may even decide not to waste resources trying to surpass the chess AI and instead use it as a tool.
Each of those 100 can hire teams or colleagues to make their domain better, so there’s always human expertise keeping the model updated.
Carmack believes AGI systems should be able to learn new tasks in realtime alongside humans in the real world.
Games generally are solvable for AI because they have feedback loops and a clear success or failure criteria. If the "picking up a Joystick" part is the limiting factor, sure. But why would we want robots to use an interface (especially a modern controller) heavily optimized for human hands; that seems like the definition of a horseless carriage.
I'm sure if you compared a monkey and a dolphins performance using a joystick you'd get results that aren't really correlated with their intelligence. I would guess that if you gave robots an R2D2 like port to jack into and play a game, that problem could be solved relatively quickly.
They also claimed it "learned" to play by playing itself only however it was clear that most of the advanced techniques were borrowed from existing AI and by observing humans.
No surprise they gave up on that project completely and I doubt they'll ever engage in anything like that again.
Money better spent on different marketing platforms.
Saying you've solved Dota after stripping out nearly all of its complexity is like saying you've solved Chess, but on a version where the back row is all Bishops.
Look at how long Theranos went on! Miraculous product. Attractive young founder with all the right pedigree, credentials, and contacts, dressed in black trurtlenecks. Hell, she even talked like Steve Jobs! Investors never had a chance.
That is what investors see. You seem to treat this as a purity contest where you define purity
As an ex dota player, I don't think this is that far off from having full on, all heroes dota. Certainly not as far of as you are making it sound.
And dota is one of the most complex games, I expect for example that an AI would instantly solve CS since aim is such a large part of the game.
It is certainly possible, but i won't be impressed by anything "playing CS" that isn't running a vision model on a display and moving a mouse, because that is the game. The game is not abstractly reacting to enemy positions and relocating the cursor, it's looking at a screen, seeing where the baddy is and then using this interface (the mouse) to get the cursor there as quickly as possible.
It would be like letting an AI plot its position on the field and what action its taking during a football match and then saying "Look, The AI would have scored dozens of times in this simulation, it is the greatest soccer player in the world!" No, sorry, the game actually requires you to locomote, abstractly describing your position may be fun but it's not the game
But then again, that is precisely the point. A chess bot also has access to gigabytes of perfect working memory. I don't see people complaining about that. It's perfectly valid to judge the best an AI can do vs the best a human can do. It's not really fair to take away exactly what a computer is good at from an AI and then say: "Look but the AI is now worse". Else you would also have to do it the other way around. How well could a human play dota if it only had access to the bot API. I don't think they would do well at all.
There are ~86 billion neurons in the human brain. If we assume each neuron stores a single bit a human also has access to gigabytes of working memory. If we assume each synapse is a bit that's terabytes. Petabytes is not unreasonable assuming 1kb of storage per synapse. (And more than 1kb is also not unreasonable.)
The whole point of the exercise is figuring out how much memory compares to a human brain.
As we're learning with LLMs, the dataset is what matters - and what's awesome is that you can see that in us, as well! I've read that our evolution is comparatively slow to the rate of knowledge accumulation in the information age - and that what this means is that you can essentially take a caveman, raise them in our modern environment and they'll be just as intelligent as the average human today.
But the core of our intelligence is logic/problem solving. We just have to solve higher order problems today, like figuring out how to make that chart in excel do the thing you want, but in days past it was figuring out how to keep the fire lit when it's raining. When you look at it, we've possessed the very core of that problem solving ability for quite a while now. I think that is the key to why we are human, and our close ancestors monkeys are...still just monkeys.
It's that problem solving ability that we need to figure out how to produce within ML models, then we'll be cooking with gas!
Elon's response to this is that if we want these androids to replace human jobs then the lowest friction alternative is for the android to be able to do anything a human can do in a human amount of space. A specialized machine is faster and more efficient, but comes with engineering and integration costs that create a barrier to entry. Elon learned this lesson the hard way when he was building out the gigafactories and ended up having to hire a lot of people to do the work while they sorted out the issues with the robots. To someone like Elon a payroll is an ever growing parasite on a companies bottom line, far better if the entire thing is automated.
AI clearly isn't at human level and it's OK to admit it.
Hundreds of millions of years of trial-and-error biological pre-training where survival/propagation is the reward function
It just isn't needed. Just like you can find let's say kangaroos in the latent space of an image generator, so we learn abstract concepts and principles of how things work as a bonus of learning to process the senses.
Maybe a way to AGI could be figuring out how to combine a video generator with a LLM or something similar in a way that allows it to understand things intuitively, instead of doing just lots and lots of some statistical bullsit.
We do have that, ever felt fear of heights? That isn't learned, we are born with it. Same with fear of small moving objects like spiders or snakes.
Such things are learned/stored very different from memories, but its certainly there and we can see animals also have those. Like cats gets very scared of objects that are long and appear suddenly, like a cucumber, since their genetic instincts thinks its a snake.
After having raised four dozen kittens that a couple of feral sisters gave birth to in my garage, I’m certain that is nonsense. It’s an internet meme that became urban legend.
I don’t think they have ever even reacted to a cucumber, and I have run many experiments because my childhood cat loved cucumbers (we’d have to guard the basket of cucumbers after harvest, otherwise she’d bite every single one of them… just once).
I don't think it's clear how much of a human brains function exists at birth though, I know it's theorised than even much of the sensory processing has to be learned.
Existing at birth is not the same thing as innate. Puberty is innate but it is not present at birth.
Neurons have finite (very low) speed of signal transfer, so just by measuring cognitive reaction time we can deduce upper bounds on how many _consecutive_ neuron connections are involved in reception, cognitive processing, and resulting reaction via muscles, even for very complex cognitive processes. And the number is just around 100 consecutive neurons involved one after another. So “the algorithm” could not be _that_ complex in the end (100x matmul+tanh?)
Granted, a lot of parallelism and feedback loops are involved, but overall it gives me (and many others) an impression that when the AGI algorithm is ever found, it’s “mini” version should be able to run on modest 2025 hardware in real time.
Biological neurons are way more complex than that. A single neuron has dentritic trees with subunits doing their own local computations. There are temporal dynamics in the firing sequences. There is so much more complexity in the biological networks. It's not comparable.
> I read through these slides and felt like I was transported back to 2018.
> Having been in this spot years ago, thinking about what John & team are thinking about, I can't help but feel like they will learn the same lesson I did the hard way.
> The lesson: on a fundamental level, solutions to these games are low-dimensional. No matter how hard you hit them with from-scratch training, tiny models will work about as well as big ones. Why? Because there's just not that many bits to learn.
> If there's not that many bits to learn, then researcher input becomes non-negligible.
> "I found a trick that makes score go up!" -- yeah, you just hard-coded 100+ bits of information; a winning solution is probably only like 1000 bits. You see progress, but it's not the AI's.
> In this simplified RL setting, you don't see anything close to general intelligence. The neural networks aren't even that important.
> You won't see _real_ learning until you absorb a ton of bits into the model. The only way I really know to do this is with generative modeling.
> A classic example: why is frame stacking just as good as RNNs? John mentioned this in his slides. Shouldn't a better, more general architecture work better?
> YES, it should! But it doesn't, because these environments don't heavily encourage real intelligence.
Systems that can learn to play Atari efficiently are exploiting the fact that the solutions to each game are simple to encode (compared to real world problems). Furthermore, you can nudge them towards those solutions using tricks that don't generalize to the real world.
“The lesson: on a fundamental level, solutions to these games are low-dimensional. No matter how hard you hit them with from-scratch training, tiny models will work about as well as big ones. Why? Because there's just not that many bits to learn.”
However making a system that can beat an unknown game does require generalization. If not real a intelligence (whatever that means) but at the level of say "a wolf".
Whether it can arise from RL alone is not certain, but it's there somewhere.
Graphics rendering and AI live on the same pyramid of technology. A pyramid with a lot of bricks with the initials "JC" carved into them, as it turns out.
Maybe someone better at aphorisms than me can say it better but I really don't see it. There are definitely mid-level low hanging fruits that would look like the kinds of things he did in graphics but the game just seems completely different.
He's also admitted he doesn't have much of math chops, which you need if you want to make a dent in AI. (Although the same could have been said of 3D graphics when he did Wolfenstein and Doom, so perhaps he'll surprise us)
I wish him well TBH
So, his initial tech was "Adaptive tile refresh" in Commander Keen, used to give it console style pixel-level scrolling. Turns out, they actually hampered themselves in Commander Keen 1 by not understanding the actual tech, and implemented "The Jolt", a feature that was not necessary. The actual hardware implemented scrolling the same way that consoles like the NES did, and did not need "the jolt", nor the limitations it imposed.
Then, Doom and Quake was mostly him writing really good optimizations of existing, known and documented algorithms and 3D techniques, usually by recognizing what assumptions they could make, what portions of the algorithm didn't need to be recalculated when, etc. Very talented at the time, but in the software development industry, making a good implementation of existing algorithms that utilize your specific requirements is called doing your job. This is still the height of his relative technical output IMO.
Fast Inverse Square Root was not invented by him, but was floating around in industry for a while. He still gets kudos for knowing about it and using it.
"Carmack's reverse" is a technique for doing stencil shadows that was a minor (but extremely clever) modification to the "standard" documented way of doing shadow buffers. There is evidence of the actual technique from a decade before Carmack put it in Doom 3 and it was outright patented by two different people the year before. There is no evidence that Carmack "stole" or anything this technique, it was independent discovery, but was clearly also just a topic in the industry at the time.
"Megatextures" from Rage didn't really go anywhere.
Did Carmack actually contribute anything to VR rendering while at Oculus?
People treat him like this programming god and I just don't understand. He was well read, had a good (maybe too good) work ethic, and was very talented at writing 386 era assembly code. These are all laudable, but doesn't in my mind imply that he's some sort of 10X programmer who could revolutionize random industries that he isn't familiar with. 3D graphics math isn't exactly difficult.
While this is certainly true, I'm not aware of any evidence that Carmack thinks this way about himself. I think he's been successful enough that's he's personally 'post-economic' and is choosing to spend his time working on unsolved hard problems he thinks are extremely interesting and potentially tractable. In fact, he's actively sought out domain experts to work with him and accelerate his learning.
Carmack is a genius no doubt. But genius is the result of intense focused practice above and beyond anyone else in a particular area. Trying to extend that to other domains has been the downfall of so many others like him.
Funnily enough Romero himself didn't ship much either. IMO it's one of the most iconic "duo breakups". The whole is greater than the sum of the parts.
Romero is credited on 27 games since he left id Software.
And I say this while most certainly not being as knowledgeable as this openai insider. So it even I can see this, then it's kinda bad, isn't it?
https://github.com/Farama-Foundation/Arcade-Learning-Environ...
The goal is to develop algorithms that generalize to other tasks.
I think we could eventually saturate Atari, but for now it looks like it's still a good source of problems that are just out of reach of current methods.
A company solves self-driving 80% of the way and makes a lot of VC cash along the way. Then they solve intelligent chatbots 80% of the way and make a lot of VC cash along the way. Now they're working on solving humanoid robotics 80% of the way... I wonder why?
In the end, we have technology that can do some neat tricks, but can't be relied upon.
There are probably still some very hard problems in certain Atari games. Only the brave dare tackle these problems, because failure comes sharp and fast. Whereas, throwing more compute at a bigger LLM might not really accomplish anything, but we can make people think it accomplished something, and thus failure is not really possible.
Continuous training is the key ingredient. Humans can use existing knowledge and apply it to new scenarios, and so can most AI. But AI cannot permanently remember the result of its actions in the real world, and so its body of knowledge cannot expand.
Take a toddler and an oven. The toddler has no concept of what an oven is other than maybe that it smells nice. The toddler will touch the oven, notice that it experiences pain (because the oven is hot) and learn that oven = danger. Place a current AI in a droid toddler body? It will never learn and keep touching the oven as soon as the information of "oven = danger" is out of the context window.
For some cases this inability to learn is actually desirable. You don't want anyone and everyone to be able to train ChatGPT unsupervised, otherwise you get 4chan flooding it with offensive crap like they did to Tay [1], but for AI that physically interacts with the meatspace, constant evaluation and learning is all but mandatory if it is to safely interact with its surroundings. "Dumb" robots run regular calibration cycles for their limbs to make sure they are still aligned to compensate for random deviations, and so will AI robots.
It is, at least if you wish to be in the meatspace, that's my point. Every day has 86400 seconds during which a human brain constantly adapts to and learns from external input - either directly as it's being awake or indirectly during nighttime cleanup processes.
On top of that, humans have built-in filters for training. Basically, we see some drunkard shouting about the Hollow Earth on the sidewalk... our brain knows that this is a drunkard and that Hollow Earth is absolutely crackpot material, so if it stores anything at all then the fact that there is a drunkard on that street and one might take another route next time, but the drunkard's rambling is forgotten maybe five minutes later.
AI, in contrast, needs to be hand-held by humans during training that annotate, "grade" or weigh information during the compilation of the training dataset, in order that the AI knows what is written in "Mein Kampf" so it can answer questions upon it, but that it also knows (or at least: won't openly regurgitate) that the solution to economic problems isn't to just deport Jews.
And huge context windows aren't the answer either. My wife says me, she would like to have a fruit cake for her next birthday. I'll probably remember that piece of information (or at the very least I'll write it down)... but an AI butler? I'd be really surprised if this is still in its context space in a year, and even if it is, I would not be surprised if it weren't able to recall that fact.
And the final thing is prompts... also not the answer. We've seen it just a few days ago with Grok - someone messed with the system prompt so it randomly interjected "white genocide" claims into completely unrelated conversation [1] despite hopefully being trained on a ... more civilised dataset, and to the contrary, we've also seen Grok reply to Twitter questions in a way that suggest that it is aware its training data is biased.
[1] https://www.reuters.com/business/musks-xai-updates-grok-chat...
That's not even remotely true. At least not in the sense that it is for context in transformer models. Or can you tell me all the visual and auditory inputs you experienced yesterday at the 45232nd second? You only learn permanently and effectively from particular stimulation coupled with surprise. That has a sample rate which is orders of magnitude lower. And it's exactly the kind of sampling that can be replicated with a run-of-the-mill persistent memory system for an LLM. I would wager that you could fit most people's core experiences and memories that they can randomly access at any moment into a 1000 page book - something that fits well into state of the art context windows. For deeper more detailed things you can always fall back to another system.
Just because I don't remember my experiences at second 45232 on May 22, doesn't mean that my brain was not actively adapting to my experiences at that moment. The brain does a lot more learning than just what is conscious. And then when I went to sleep the brain continued pruning and organizing my unconscious learning for the day.
Seeing if someone can go from token to freeform physical usefulness will be interesting. I'm of the belief that LLMs are too verbose and energy intensive to go from language regurgitation machines to moving in the real world according to free form prompting. It may be accomplishable with the vast amount of hype investment, but I think the energy requirements and latency will make an LLM-based approach economically infeasible.
This is just, not true. A single 2min conversation with emotional or intellectual resonance can significantly alter a human’s thought process for years. There are some topics where every time they come up directly or analogously I can recall something a teacher told me in high school that “stuck” with me for whatever reason. And it isn’t even a “core” experience, just something that instantly clicked for my brain and altered my problem solving. At the time, there’s no heuristic that could predict how or why that particular interaction should have that kind of staying power.
Not to mention, experiences that subtly alter thinking or behavior just by virtue of providing some baseline familiarity instead of blank slate problem solving or routine. Like how you subtly adjust how you interact with coworkers based on the culture of your current company over time vs the last without any “flash” of insight required.
I think it depends on how you look at it. I don't want to torture the analogy too much, but I see the pre-training (getting model weights out of an enormous corpus of text) as more akin to the billions of years of evolution that led to the modern human brain. The brain still has a lot to learn once you're born, but it already also has lots of structures (e.g. to handle visual input, language, etc) and built-in knowledge (instincts). And you can't change that over the course of your life.
I wouldn't be surprised if we ended up in a "pre-train / RAG / context window" architecture of AI, analogously to "evolution / long term memory / short term memory" in humans.
Doesn't the article states that this is not true? AI cannot apply to B what it learned about A.
That's essentially what we're looking for when we talk about general intelligence, the capability to adapting what we know to what we know nothing about.
This is a continuous process.
His goal is to develop generic methods. So you could work with more complex games or the physical world for that, as that is what you want in the end. However, his insight is, you can even modify the Atari setting to test this, e.g. to work in realtime, and the added complexity by more complex games doesn't really give you any new additional insights at this point.
The approach NVIDIA are using (and other labs) clearly works. It's not going to be more than a year or two now before robotics is as solved as NLP and chatbots are today.
But also, he argues a lot about sample efficiency. He wants to develop algorithms/methods/models which can learn much faster / with much fewer data.
If it is substantially more sample efficient, or generalizable, than prior work then that would be exciting. But I'm not sure if it is?
If so, scaling up may be more of a distraction rather than helpful (besides wasting resources).
I hope he succeeds in whatever he's aiming for.
Using real life robots is going to be a huge bottleneck for training hours no matter what they do.
Maybe it will turn out to simply be enough artificial neurons and everything works. But I don't believe that.
Why not tackle robotics if anything. Or really just be the best AGI and everyone will be knocking on your door to license it in their hardware/software stacks, you will print infinite money.
If they had that people would make agents with it and then it can do tons of truly meaningful things.
People try to make agents with the current one but its really difficult since its not AGI.
I don’t think AGI is close, but once it happens it’s hard to imagine it not “escaping” (whenever we want to define that as).
What would you do with a 10x or 100x smarter Siri/Alexa? I still don't see my life changing.
Give me a robot that can legitimately do household errands like the dishes, laundry, etc.. now we are talking.
There really isn’t any other way to interpret OpenAI’s actions for the last few months.
Sure it could all be a feint to hide their amazing progress. Or it could be what it looks like.
Given the hype cycles of the last 20 years, I’m going with the second.
After generations of boastful over-promising, do you really believe THIS time they are underpromising?
They started doing that a couple of years ago. The frontier "language" models are natively multimodal, trained on audio, text, video, images. That is all in the same model, not separate models stitched together. The inputs are tokenized and mapped into a shared embedding space.
Gemini, GPT-4o, Grok 3, Claude 3, Llama 4. These are all multimodal, not just "language models".
Are the audio/video/images tokenized the same way as text and then fed in as a stream? Or is the training objective different than "predict next token"?
If the former, do you think there are limitations to "stream of tokens"? Or is that essentially how humans work? (Like I think of our input as many-dimensional. But maybe it is compressed to a stream of tokens in part of our perception layer.)
which i found interesting, because i remember Carmack saying simulated environments are way forward and physical environments are too impractical for developing AI
https://www.youtube.com/watch?v=_2NijXqBESI
He also talks needing large amounts of compute to run the virtual environments where you'll be training embodied AI. Very much worth watching.
It's a shame that pretrained approach leads to such good enough result. The learning-from-experience, or what should be the "right" approach, will stagnate. I might be wrong, but it seems that aside from Carmack and a small team, "the world" is just not looking/investing on that side of the AI anymore.
However, I find it funny that Carmack is now researching for such approach. At the end of the day, he was the one who invented Portals, an algorithm to circumvent the need to reproduce the whole 3D world and therefore making 3D games computationally possible.
As a side note, I wonder what models are to come once we see the latest state of the art AI Video training technologies, in synch with the joystick movements from a real player. Maybe the results are so astonishing that even Carmack changes his mind on the subject.
EDIT::grammar & typos
We’ll see. I’m skeptical that you’ll ever get novel theories like special and general relativity out of LLMs. For stuff like that I suspect you need the interactive learning approach, and perhaps more importantly, the ability to reject the current best theories and invent a replacement.
- Someone who is a programmer but follows a hypermasculine cliche and makes sure everyone knows about it.
- An insult used by other developers for someone who is more physically fit or interested in their health than themselves.
- An insult used by engineers or other people who are not happy with the over representation of men in the industry. So everyone is lumped in the category.
- Someone who is obsessed with the technology and trying to grind their skills on it to an excessive level.
I don’t take it as a pejorative, it’s an acknowledgement of my efforts to be even considered in this category. For those wondering I have a family, and have healthy activities otherwise. No cool diets or bioscience, just code, physical activity and coffee/water.
This isn’t a lifestyle I’m saying everyone should do, only that people should do what makes them happiest and most fulfilled for their set of goals.
Isn't "interested in their health" a signal that they are interested in themselves, rather than the opposite?
It tripped me up too, to be fair.
sounds like a person who respects their own profession though
On the other hand, I’ve seen and heard “dude” and “guy” used by and applied to women by other women. Not common but it happens. But I’ve never heard “bro” used that way.
TIL JC has elite reflexes
Quite exciting. Without diminishing the amazing value of LLMs, I don't think that path goes all the way to AGI. No idea if Carmack has the answer, but some good things will come out of that small research group, for sure.
I do agree that it is not particularly groundbreaking, but it's a nice "hey, here's our first update".
In a game there might be a level with a door and a key, and because there's no reward for getting the key closer to the door, bridging this gap requires random search in a massive state space. But in the vast sea of scenarios that you can find in Common Crawl there's probably one, where you are 1 step from the key, and the key is 1 step from the door, so you get the reward signal from it without having to search an enormous state space.
You might say "but you have to search through the giant Common Crawl". Well yes, but while doing so you will get reward signal not just for the key and door problem, but for nearly every problem in the world.
The point is: pretraining teaches models to extract signal that can be used to explore solutions to hard search problems, and if you don't do that you are wasting your time enumerating giant state spaces.
In this case, John is going off on this inane tangent because of his prior experience with hardware and video games instead of challenging himself to solve the actual hard and open problems.
I’m going to predict how this plays out for the inevitable screenshot in one to two years. John picks some existing RL algo and optimizes it to run in real time on real hardware. While he’s doing this the field moves on to better and new algorithms and architectures. John finally achieves his goal and posts a vid of some (now ancient) RL algo playing some Atari game in real time. Everyone says “neat” and moves on. John gets to feel validated yet all his work is completely useless.
John's document covers why he's doing what he's doing:
> Fundamentally, I believe in the importance of learning from a stream of interactive experience, as humans and animals do, which is quite different from the throw-everything-in-a-blender approach of pretraining an LLM. The blender approach can still be world-changingly valuable, but there are plenty of people advancing the state of the art there.
He thinks interacting with the real world and learning as you go isn't getting enough attention and might take us farther than the LLM approach. So he's applying these ideas to a subject that he's an expert in. You don't seem to find this approach interesting but John does (and I do too, for the record).
Everybody dismissing him might be right. Those keeping score know that Carmack's batting average isn't one thousand. But those people also know Carmack has the resources to work on pretty much whatever he wants to work on. I'm happy he's still working hard at something and sharing his work.
FWIW, I’m aware of that heuristic too which is why I intentionally use the word “just” as a meta heuristic to filter a certain type of person.
He’s a AAA software engineer but the prerequisites to build out cutting edge AI require deep formal math that is beyond his education and years at this point.
Nothing to stop him playing around with AI models though.
I'm pretty excited to see him in this domain. I think he'll focus on some DeepSeek style improvements.
Having JC focusing on, say, writing a performant OSS CUDA replacement could be bigger than any of the last 20 announcements from openai/goggle/deepmind/etc
So I asked Ilya, their chief scientist, for a reading list. This is my path, my way of doing things: give me a stack of all the stuff I need to know to actually be relevant in this space.
And he gave me a list of like 40 research papers and said, 'If you really learn all of these, you'll know 90% of what matters today! And I did. I plowed through all those things and it all started sorting out in my head.
To put it another way, the idea that John Carmack is going to do groundbreaking research in AI is roughly as plausible as the idea that Yann LeCun is going to make a successful AAA video game. Stranger things have happened, but I won’t be holding my breath.
In that context anyone can make progress in the field, as long as they understand what they're dealing with.
Better regard mr. Carmack as an X factor. Maybe the experts will leave him in the dust. Or maybe he'll come up with something that none of the experts cared to look into.
I believe all his in-depth experience in other areas will heavily unlock him to bring about another breakthrough. He's that good.
He's probably one of the most qualified people around.
All deep formal math is a boundary to a thing.
But don't get me wrong! Since this is a long-term research endeavor of his, I believe really starting from the basics is good for him and will empower him to bring something new to the table eventually.
I'm surprised though that he "only" came so far as of now. Maybe my slight idolization of Carmack made me kinda of blind to the fact that this kind of research is a mean beast after all and there is a reason that huuuuge research labs dump countless of man-decades into this kind of stuff with no guaranteed breakthroughs.
I'm nowhere as good at my craft as someone who works for openai, which the author of that tweet seems to be, but if even I can see this, then it's bad, isn't it?
I would argue that if he wants to do AGI through RL, a LLM could be a perfect teacher or oracle.
After all i'm not walking around as a human and not having guidance. It should/could make RL a lot faster leveraging this.
My logical part / RL part does need the 'database'/fact part and my facts are trying to be as logical as possible but its just not.
It's still a lot better to really learn and discover it yourself to really get it.
Also it's hard to determine how much time someone spent on particular topic.
seems that we are learning in layers, one of the first layers being 2D neural net (images) augmented by other sensory data to create a 3D if not 4D model (neural net). HRTFs for sound increases the spatial data we get from images. With depth coming from sound and light and learnt movements(touch) we seem to develop a notion of space and time. (multimodality?)
Seems that we can take low dimensional inputs and correlate them to form higher dimensional structures.
Of course, physically it comes from noticing the dampening of visual data (in focus for example) and memorized audio data (sound frequency and amplitude, early reflections, doppler effect etc). That should be emergent from training.
Those data sources can be inperfectly correlated. That's why we count during a lightning storm to evaluate distance. It's low dimensional.
In a sense, it's a measure of required effort perhaps (distance to somewhere).
What's funny is that it seems to go the other way from traditional training where we move from higher dimensional tensor spaces to lower ones. At least in a first step.
Nonetheless, yes we do know certain brain structures like your image net analogy but the way you describe it, sounds a little bit of.
Our virtual cortex is not 'just a layer' its a component i would say and its optimized of detecting things.
Other components act differently with different structures.