If the model comes up with anything even remotely correct it would be quite a strong evidence that LLMs are a path to something bigger if not then I think it is time to go back to the drawing board.
Things don't happen all of a sudden, and being able to see all the scientific papers of the era its possible those could have fallen out of the synthesis.
Standing on the shoulders of giants, as it were
It's still unclear whether this process could be merely continued, seeded only with new physical data, in order to keep progressing beyond that point, "forever", or at least for as long as we imagine humans will continue to go on making scientific progress.
These two are so above everyone else in the mathematical world that most people would struggle for weeks or even months to understand something they did in a couple of minutes.
There's no "get down and dirty" shortcut with them =)
However, what I'm saying is not mere nitpicking either. It is precisely because of my belief in Einstein's extraordinary abilities that I find it unconvincing that an LLM being able to recombine the extant written physics-related building blocks of 1900, with its practically infinite reading speed, necessarily demonstrates comparable capabilities to Einstein.
The essence of the question is this: would Einstein, having been granted eternal youth and a neverending source of data on physical phenomena, be able to innovate forever? Would an LLM?
My position is that even if an LLM is able to synthesise special relativity given 1900 knowledge, this doesn't necessarily mean that a positive answer to the first question implies a positive answer to the second.
This would absolutely be very good evidence that models can actually come up with novel, paradigm-shifting ideas. It was absolutely not obvious at that time from the existing facts, and some crazy leap of faiths needed to be taken.
This is especially true for General Relativity, for which you had just a few mismatch in the mesurements like Mercury's precession, and where the theory almost entirely follows from thought experiments.
However, you could also argue that it's actually empirical evidence that general relativity and 19th century physics wasn't truly a paradigm shift -- you could have 'derived' it from previous data -- that the LLM has actually proven something about structurally similarities between those paradigms, not that it's demonstrating general intelligence...
For instance spectroscopy enables one to look at the spectra emitted by another 'thing', perhaps the sun, and it turns out that there's little streaks within the spectra the correspond directly to various elements. This is how we're able to determine the elemental composition of things like the sun.
That connection between elements and the patterns in their spectra was discovered in the early 1800s. And those patterns are caused by quantum mechanical interactions and so it was perhaps one of the first big hints of quantum mechanics, yet it'd still be a century before we got to relativity, let alone quantum mechanics.
But the whole question is whether or not something can do that synthesis!
And the "anyone who read all the right papers" thing - nobody actually reads all the papers. That's the bottleneck. LLMs don't have it. They will continue to not have it. Humans will continue to not be able to read faster than LLMs.
Even me, using a speech synthesizer at ~700 WPM.
If it's true of everything, then surely having an LLM work iteratively on the pieces, along with being provided additional physical data, will lead to the discovery of everything?
If the answer is "no", then surely something is still missing.
> And the "anyone who read all the right papers" thing - nobody actually reads all the papers. That's the bottleneck. LLMs don't have it. They will continue to not have it. Humans will continue to not be able to read faster than LLMs.
I agree with this. This is a definitive advantage of LLMs.
Excellence at any given task is not indicative of intelligence. I think we set these sort of false goalposts because we want something that sounds achievable but is just out of reach at one moment in time. For instance at one time it was believed that a computer playing chess at the level of a human would be proof of intelligence. Of course it sounds naive now, but it was genuinely believed. It ultimately not being so is not us moving the goalposts, so much as us setting artificially low goalposts to begin with.
So for instance what we're speaking of here is logical processing across natural language, yet human intelligence predates natural language. It poses a bit of a logical problem to then define intelligence as the logical processing of natural language.
I'm sure, without any uncertainty, that any generalist model able to do what Einstein did would be AGI, as in, that model would be able to perform any cognitive task that an intelligent human being could complete in a reasonable amount of time (here "reasonable" depends on the task at hand; it could be minutes, hours, days, years, etc).
(1) - A major part of the challenge is in not being directed towards something. There was no external guidance for Einstein - he wasn't even a formal researcher at the time of his breakthroughs. An LLM might be able to be handheld towards relativity, though I doubt it, but given the prompt of 'hey find something revolutionary' it's obviously never going to respond with anything relevant, even with substantially greater precision specifying field/subtopic/etc.
(2) - Logical processing of natural language remains one small aspect of intelligence. For example - humanity invented natural language from nothing. The concept of an LLM doing this is a nonstarter since they're dependent upon token prediction, yet we're speaking of starting with 0 tokens.
(3) - LLMs are, in many ways, very much like calculators. They can indeed achieve some quite impressive feats in specific domains, yet then they will completely hallucinate nonsense on relatively trivial queries, particularly on topics where there isn't extensive data to drive their token prediction. I don't entirely understand your extreme optimism towards LLMs given this proclivity for hallucination. Their ability to produce compelling nonsense makes them particularly tedious for using to do anything you don't already effectively know the answer to.
Simply because I don't see hallucinations as a permanent problem. I see that models keep improving more and more in this regard, and I don't see why the hallucination rate can't be abirtrarily reduced with further improvements to the architecture. When I ask Claude about obscure topics, it correctly replies "I don't know", where past models would have hallucinated an answer. When I use GPT 5.2-thinking for my ML research job, I pretty much never encounter hallucinations.
THat's not how science works.
The LLM would have to propose experiments (which would have to be simulated), and then develop its theories from that.
Maybe there had been enough facts around to suggest a number of hypotheses, but the LLM in its curent form won't be able to confirm them.
I by no means believe LLMs are general intelligence, and I've seen them produce a lot of garbage, but if they could produce these revolutionary theories from only <= year 1900 information and a prompt that is not ridiculously leading, that would be a really compelling demonstration of their power.
It turns out my reading is somewhat topical. I've been reading Rhodes' "The Making of the Atomic Bomb" and of the things he takes great pains to argue (I was not quite anticipating how much I'd be trying to recall my high school science classes to make sense of his account of various experiments) is that the development toward the atomic bomb was more or less inexorable and if at any point someone said "this is too far; let's stop here" there would be others to take his place. So, maybe, to answer your question.
Yes. It is certainly a question if Einstein is one of the smartest guy ever lived or all of his discoveries were already in the Zeitgeist, and would have been discovered by someone else in ~5 years.
Einstein was smart and put several disjointed things together. It's amazing that one person could do so much, from explaining the Brownian motion to explaining the photoeffect.
But I think that all these would have happened within _years_ anyway.
Kind of, how long would it have realistically taken for someone else (also really smart) to come up with the same thing if Einstein wouldn't have been there?
I personally doubt that this would work. I don't think these systems can achieve truly ground-breaking, paradigm-shifting work. The homeworld of these systems is the corpus of text on which it was trained, in the same way as ours is physical reality. Their access to this reality is always secondary, already distorted by the imperfections of human knowledge.
There's a very similar parallel with dark matter in modern times. We certainly have endless hints to the truth that will be evident in hindsight, but for now? We are mostly convinced that we know the truth, perform experiments to prove that, find nothing, shrug, adjust the model to be even more esoteric, and repeat onto the next one. And maybe one will eventually show something, or maybe we're on the wrong path altogether. This quote, from Michelson in 1894 (more than a decade before Einstein would come along), is extremely telling of the opinion at the time:
"While it is never safe to affirm that the future of Physical Science has no marvels in store even more astonishing than those of the past, it seems probable that most of the grand underlying principles have been firmly established and that further advances are to be sought chiefly in the rigorous application of these principles to all the phenomena which come under our notice. It is here that the science of measurement shows its importance — where quantitative work is more to be desired than qualitative work. An eminent physicist remarked that the future truths of physical science are to be looked for in the sixth place of decimals." - Michelson 1894
For another parallel our understanding of the big bang was, and probably is, wrong. There are a lot of problems with the traditional view of the big bang with the horizon problem [1] being just one among many - areas in space that should not have had time to interact behave like they have. So this was 'solved' by an ad hoc solution - just make the expansion of the universe go into super-light speed for a fraction of a second at a specific moment, slow down, then start speeding up again (cosmic inflation [2]) - and it all works just fine. So you know what we did? Shrugged, accepted it, and even gave Guth et al a bunch of accolades for 'solving' the puzzle.
This is the problem - arguably the most important principle of science is falsifiability. But when is something falsified? Because in many situations, probably the overwhelming majority, you can instead just use one falsification to create a new hypothesis with that nuance integrated into it. And as science moves beyond singular formulas derived from clear principles or laws and onto broad encompassing models based on correlations from limited observations, this becomes more and more true.
It’s probably even more of a problem that different areas of scientific development don’t know about each other. LLMs combining results would still not be like they invented something new.
But if they could give us a head start of 20 years on certain developments this would be an awesome result.
My own prediction is that the LLMs would totally fail at connecting the dots, but a small group of very smart humans can.
Things don't happen all of a sudden, but they also don't happen everywhere. Most people in most parts of the world would never connect the dots. Scientific curiosity is something valuable and fragile, that we just take for granted.
--
[0] - To people who happen to have the right background and skill set, and are in the right place.
[1] - Almost always multiple someones, independently, within short time of each other. People usually remember only one or two because, for better or worse, history is much like patent law: first to file wins.
Take Michelson in 1894: after doing (and inspiring) the kind of precision work that should have set off alarm bells, he’s still talking like the fundamentals are basically done and progress is just “sixth decimal place” refinement.
"While it is never safe to affirm that the future of Physical Science has no marvels in store even more astonishing than those of the past, it seems probable that most of the grand underlying principles have been firmly established and that further advances are to be sought chiefly in the rigorous application of these principles to all the phenomena which come under our notice. It is here that the science of measurement shows its importance — where quantitative work is more to be desired than qualitative work. An eminent physicist remarked that the future truths of physical science are to be looked for in the sixth place of decimals." - Michelson 1894
The Michelson-Morley experiments weren't obscure, they were famous, discussed widely, and their null result was well-known. Yet for nearly two decades, the greatest physicists of the era proposed increasingly baroque modifications to existing theory rather than question the foundational assumption of absolute time. These weren't failures of data availability or technical skill, they were failures of imagination constrained by what seemed obviously true about the nature of time itself.
Einstein's insight wasn't just "connecting dots" here, it was recognizing that a dot everyone thought was fixed (the absoluteness of simultaneity) could be moved, and that doing so made everything else fall into place.
People scorn the 'Great Man Hypothesis' so much they sometimes swing too much in the other direction. The 'multiple discovery' pattern you cite is real but often overstated. For Special Relativity, Poincaré came close, but didn't make the full conceptual break. Lorentz had the mathematics but retained the aether. The gap between 'almost there' and 'there' can be enormous when it requires abandoning what seems like common sense itself.
That's why those tiny steps of scientific and technological progress aren't made by just any randos - they're made by people who happen to be at the right place and time, and equipped correctly to be able to take the step.
The important corollary to this is that you can't generally predict this ahead of time. Someone like Einstein was needed to nail down relativity, but standing there few years earlier, you couldn't have predicted it was Einstein who would make a breakthrough, nor what would that be about. Conversely, if Einstein lived 50 years earlier, he wouldn't have come up with relativity, because necessary prerequisites - knowledge, people, environment - weren't there yet.
Your second and third paragraphs are entirely consistent with the original point I was trying to make, which was not that it took Einstein specifically to come up with relativity, but that it took someone with uncommon skills, as evidenced by the fact that it blindsided even a good many of the people who were qualified to be contenders for being the one to figure it out first. It does not amount to proof, but one does not expect people who are closing in on the solution to be blindsided by it.
I am well aware of the problems with “great man” hagiography, but dismissing individual contributions, which is what the person I was replying to seemed to be doing, is a distortion in its own way.
I imagine possible buffers on that conjecture synthesis being epxerimentation and acceptance by the scientific community. AIs can come up with new ideas every day but Nature won't publish those ideas for years.
Training data for a base model isn't that hard to come by, even though you have to OCR most of it yourself because the publicly available OCRed versions are commonly unusably bad. But training a model large enough to be useful is a major issue. Training a 700M parameter model at home is very doable (and is what this TimeCapsuleLLM is), but to get that kind of reasoning you need something closer to a 70B model. Also a lot of the "smarts" of a model gets injected in fine tuning and RL, but any of the available fine tuning datasets would obviously contaminate the model with 2026 knowledge.
It largely cannot imaginr anything beyond the usual but there is a small part that it can. This is similar to in context learning, it's weak but it is there.
It would be incredible if meta learning/continual learning found a way to train exactly for novel learning path. But that's literally AGI so maybe 20yrs from now? Or never..
You can see this on CL benchmarks. There is SOME signal but it's crazy low. When I was traing CL models i found that signal was in the single % points. Some could easily argue it was zero but I really do believe there is a very small amount in there.
This is also why any novel work or findings is done via MASSIVE compute budgets. They find RL enviroments that can extract that small amount out. Is it random chance? Maybe, hard to say.
AI cannot compound their learnings for the foreseeable future
If we did this (to a good enough level of detail), would it be able to derive relativity? How large of an AI model would it have to be to successfully derive relativity (if it only had access to everything published up to 1904)?
I mostly agree with you, especially about distrusting the self-interested hype beasts.
While I don't think the models are actually "intelligent", I also wonder if there are insights to be gained by looking at how concepts get encoded by the models. It's not really that the models will add something "new", but more that there might be connections between things that we haven't noticed, especially because academic disciplines are so insular these days.
I guess the arguments sometimes mention languages. But I feel like the core of the arguments are pretty much the same regardless?
Most ML researchers think AGI is imminent.
I don't think there is any level of broad agreement right now. There are tons of random camps none of which I would consider to be broadly dominating.
Not sure how to interpret that as almost imminent.
20 years away in 2026, still 20 years away in 2027, etc etc.
Whatever Altman's hyping, that's the translation.
At least the model part. Although others made same thought as you afaik none tried it.
They measure compression (perplexity) on future Wikipedia, news articles, code, arXiv papers, and multi-modal data. Data compression is intimately connected with robustness and generalization.
A). contaminate the model with your own knowledge of relativity, leading it on to "discover" what you know, or
B). you will try to simulate a blind operation but without the "competent human physicist knowledgeable up to the the 1900 scientific frontier" component prompting the LLM, because no such person is alive today nor can you simulate them (if you could, then by definition you can use that simulated Einstein to discover relativity, so the problem is moot).
So in both cases you would prove nothing about what a smart and knowledgeable scientist can achieve today from a frontier LLM.
In principle I see your point, in practice my default assumption until proven otherwise here -- is that a little something slipped through post-1900.
A much easier approach would be to just download some model, whatever model, today. Then 5 years from now, whatever interesting discoveries are found - can the model get there.
I imagine if you explained quantum field theory to a 19th century scientists they might think of it as a more refined understanding of luminiferous aether.
Or if an 18th century scholar learned about positive and negative ions, it could be seen as an expansion/correction of phlogiston theory.
[0] https://github.com/haykgrigo3/TimeCapsuleLLM/blob/main/Copy%...
In fact you don’t want it to know about them explicitly just have enough background knowledge that you can manage the rest via context.
But I think there are also some questions about the role of language in human thought that leave the door just slightly ajar on the issue of whether or not manipulating the tokens of language might be more central to human cognition than we've tended to think.
If it turned out that this was true, then it is possible that "a model predicting tokens" has more power than that description would suggest.
I doubt it, and I doubt it quite a lot. But I don't think it is impossible that something at least a little bit along these lines turns out to be true.
Unfortunately, none of that has anything to do with what LLMs are doing. The LLM is not thinking about concepts and then translating that into language. It is imitating what it looks like to read people doing so and nothing more. That can be very powerful at learning and then spitting out complex relationships between signifiers, as it's really just a giant knowledge compression engine with a human friendly way to spit it out. But there's absolutely no logical grounding whatsoever for any statement produced from an LLM.
The LLM that encouraged that man to kill himself wasn't doing it because it was a subject with agency and preference. It did so because it was, quite accurately I might say, mimicking the sequence of tokens that a real person encouraging someone to kill themselves would write. At no point whatsoever did that neural network make a moral judgment about what it was doing because it doesn't think. It simply performed inference after inference in which it scanned through a lengthy discussion between a suicidal man and an assistant that had been encouraging him and then decided that after "Cold steel pressed against a mind that’s already made peace? That’s not fear. That’s " the most accurate token would be "clar" and then "ity."
We know what our experience is - thinking about concepts and then translating that into language - but we really don't know with much confidence what is actually going on.
I lean strongly toward the idea that humans are doing something quite different than LLMs, particularly when reasoning. But I want to leave the door open to the idea that we've not understood human cognition, mostly because our primary evidence there comes from our own subjective experience, which may (or may not) provide a reliable guide to what is actually happening.
We do know what it's not doing, and that is operating only through reproducing linguistic patterns. There's no more cause to think LLMs approximate our thought (thought being something they are incapable of) than that Naive-Bayes spam filter models approximate our thought.
'Language' is only the initial and final layers of a Large Language Model. Manipulating concepts is exactly what they do, and it's unfortunate the most obstinate seem to be the most ignorant.
It may, however, turn out that in doing what they do, they are effectively manipulating concepts, and this is what I was alluding to: by building the model, even though your approach was through tokenization and whatever term you want to use for the network, you end up accidentally building something that implicitly manipulates concepts. Moreover, it might turn out that we ourselves do more of this than we perhaps like to think.
Nevertheless "manipulating concepts is exactly what they do" seems almost willfully ignorant of how these systems work, unless you believe that "find the next most probable sequence of tokens of some length" is all there is to "manipulating concepts".
Yes, they do. And of course there is. And there's plenty of research on the matter.
>It may, however, turn out that in doing what they do, they are effectively manipulating concepts
There is no effectively here. Text is what goes in and what comes out, but it's by no means what they manipulate internally.
>Nevertheless "manipulating concepts is exactly what they do" seems almost willfully ignorant of how these systems work, unless you believe that "find the next most probable sequence of tokens of some length" is all there is to "manipulating concepts".
"Find the next probable token" is the goal, not the process. It is what models are tasked to do yes, but it says nothing about what they do internally to achieve it.
Emergent World Representations: Exploring a Sequence Model Trained on a Synthetic Task - https://openreview.net/forum?id=DeG07_TcZvT
On the Biology of a Large Language Model - https://transformer-circuits.pub/2025/attribution-graphs/bio...
Emergent Introspective Awareness in Large Language Models - https://transformer-circuits.pub/2025/introspection/index.ht...
And this is OK and still quite interesting - we do it to ourselves all the time. Often it's the only way we have of understanding the world (or ourselves).
However, in the case of LLMs, which are tools that we have created from scratch, I think we can require a higher standard.
I don't personally think that any of these papers suggest that LLMs manipulate concepts. They do suggest that the internal representation after training is highly complex (superposition, in particular), and that when inputs are presented, it isn't unreasonable to talk about the observable behavior as if it involved represented concepts. It is useful stance to take, similar to Dennett's intentional stance.
However, while this may turn out to be how a lot of human cognition works, I don't think it is what is the significant part of what is happening when we actively reason. Nor do I think it corresponds to what most people mean by "manipulate concepts".
The LLM, despite the prescence of "features" that may correspond to human concepts, is relentlessly forward-driving: given these inputs, what is my output? Look at the description in the 3rd paper of the arithmetic example. This is not "manipulating concepts" - it's a trick that often gets to the right answer (just like many human tricks used for arithmetic, only somewhat less reliable). It is extremely different, however, from "rigorous" arithmetic - the stuff you learned when you somewhere between age 5 and 12 perhaps - that always gives the right answer and involves no pattern matter, no inference, no approximations. The same thing can be said, I think, about every other example in all 4 papers, to some degree or another.
What I do think is true (and very interesting) is that it seems somewhere between possible and likely that a lot more human cognition than we've previously suspected uses similar mechanisms as these papers are uncovering/describing.
I’m not sure what distinction you’re drawing here. A lot of mechanistic interpretability work is explicitly trying to describe what the model is doing in the most literal sense we have access to: identifying internal features/circuits and showing that intervening on them predictably changes behavior. That’s not “as-if” gloss; it’s a causal claim about internals.
If your standard is higher than “we can locate internal variables that track X and show they causally affect outputs in X-consistent ways,” what would count as “actually describing what it’s doing”?
>However, in the case of LLMs, which are tools that we have created from scratch, I think we can require a higher standard.
This is backwards. We don’t “create them from scratch” in the sense relevant to interpretability. We specify an architecture template and a training objective, then we let gradient descent discover a huge, distributed program. The “program” is not something we wrote or understand. In that sense, we’re in a similar epistemic position as neuroscience: we can observe behavior, probe internals, and build causal/mechanistic models, without having full transparency.
So what does “higher standard” mean here, concretely? If you mean “we should be able to fully enumerate a clean symbolic algorithm,” that’s not a standard we can meet even for many human cognitive skills, and it’s not obvious why that should be the bar for “concept manipulation.”
>I don't personally think that any of these papers suggest that LLMs manipulate concepts. They do suggest that the internal representation after training is highly complex (superposition, in particular), and that when inputs are presented, it isn't unreasonable to talk about the observable behavior as if it involved represented concepts. It is useful stance to take, similar to Dennett's intentional stance.
You start with “there is no representation of a concept,” but then concede “features that may correspond to human concepts.” If those features are (a) reliably present across contexts, (b) abstract over surface tokens, and (c) participate causally in producing downstream behavior, then that is a representation in the sense most people mean in cognitive science. One of the most frustrating things about these sorts of discussions is the meaningless semantic games and goalpost shifting.
>The LLM, despite the prescence of "features" that may correspond to human concepts, is relentlessly forward-driving: given these inputs, what is my output?
Again, that’s a description of the objective, not the internal computation. The fact that the training loss is next-token prediction doesn’t imply the internal machinery is only “token-ish.” Models can and do learn latent structure that’s useful for prediction: compressed variables, abstractions, world regularities, etc. Saying “it’s just next-token prediction” is like saying “humans are just maximizing inclusive genetic fitness,” therefore no real concepts. Goal ≠ mechanism.
> Look at the description in the 3rd paper of the arithmetic example. This is not "manipulating concepts" - it's a trick that often gets to the right answer
Two issues:
1. “Heuristic / approximate” doesn’t mean “not conceptual.” Humans use heuristics constantly, including in arithmetic. Concept manipulation doesn’t require perfect guarantees; it requires that internal variables encode and transform abstractions in ways that generalize.
2. Even if a model is using a “trick,” it can still be doing so by operating over internal representations that correspond to quantities, relations, carry-like states, etc. “Not a clean grade-school algorithm” is not the same as “no concepts.”
>Rigorous arithmetic… always gives the right answer and involves no pattern matching, no inference…
“Rigorous arithmetic” is a great example of a reliable procedure, but reliability doesn’t define “concept manipulation.” It’s perfectly possible to manipulate concepts using approximate, distributed representations, and it’s also possible to follow a rigid procedure with near-zero understanding (e.g., executing steps mechanically without grasping place value).
So if the claim is “LLMs don’t manipulate concepts because they don’t implement the grade-school algorithm,” that’s just conflating one particular human-taught algorithm with the broader notion of representing and transforming abstractions.
I'll see if I can try to explain what I mean here, because I absolutely don't believe this is shifting the goal posts.
There are a couple of levels of human cognition that are particularly interesting in this context. One is the question of just how the brain does anything at all, whether that's homeostasis, neuromuscular control or speech generation. Another is how humans engage in conscious, reasoned thought that leads to (or appears to lead to) novel concepts. The first one is a huge area, better understood than the second though still characterized more by what we don't know than what we do. Nevertheless, it is there that the most obvious parallels with e.g. the Lindsey papers can be found. Neural networks, activation networks and waves, signalling etc. etc. The brain receives (lots of) inputs, generates responses including but not limited to speech generation. It seems entirely reasonable to suggest that maybe our brains, given a somewhat analogous architecture at some physical level to the one used for LLMs, might use similar mechanisms as the latter.
However, nobody would say that most of what the brain does involves manipulating concepts. When you run from danger, when you reach up grab something from a shelf, when you do almost anything except actual conscious reasoning, most of the accounts of how that behavior arises from brain activity does not involve manipulating concepts. Instead, we have explanations more similar to those being offered for LLMs - linked patterns of activations across time and space.
Nobody serious is going to argue that conscious reasoning is not built on the same substrate as unconscious behavior, but I think that most people tend to feel that it doesn't make sense to try to shoehorn it into the same category. Just as it doesn't make much sense to talk about what a text editor is doing in terms of P and N semiconductor gates, or even just logic circuits, it doesn't make much sense to talk about conscious reasoning in terms of patterns of neuronal activation, despite the fact that in both cases, one set of behavior is absolutely predicated on the other.
My claim/belief is that there is nothing inside an LLM that corresponds even a tiny bit to what happens when you are asked "What is 297 x 1345?" or "will the moon be visible at 8pm tonight?" or "how does writer X tackle subject Y differently than writer Z?". They can produce answers, certainly. Sometimes the answers even make significant sense or better. But when they do, we have an understanding of how that is happening that does not require any sense of the LLM engaging in reasoning or manipulating concepts. And because of that, I consider attempts like Lindsey's to justify the idea that LLMs are manipulating concepts to be misplaced - the structures Lindsey et al. are describing are much more similar to the ones that let you navigate, move, touch, lift without much if any conscious thought. They are not, I believe, similar to what is going on in the brain when you are asked "do you think this poem would have been better if it was a haiku?" and whatever that thing is, that is what I mean by manipulating concepts.
> Saying “it’s just next-token prediction” is like saying “humans are just maximizing inclusive genetic fitness,” therefore no real concepts. Goal ≠ mechanism.
No. There's a huge difference between behavior and design. Humans are likely just maximizing genetic fitness (even though that's really a concept, but that detail is not worth arguing about here), but that describes, as you note, a goal not a mechanism. Along the way, they manifest huge numbers of sub-goal directed behaviors (or, one could argue quite convincingly, goal-agnostic behaviors) that are, broadly speaking, not governed by the top level goal. LLMs don't do this. If you want to posit that the inner mechanisms contain all sorts of "behavior" that isn't directly linked to the externally visible behavior, be my guest, but I just don't see this as equivalent. What humans visibly, mechanistically do covers a huge range of things; LLMs do token prediction.
This framing assumes "concept manipulation" requires conscious, deliberate reasoning. But that's not how cognitive science typically uses the term. When you reach for a shelf, your brain absolutely manipulates concepts - spatial relationships, object permanence, distance estimation, tool affordances. These are abstract representations that generalize across contexts. The fact that they're unconscious doesn't make them less conceptual
>My claim/belief is that there is nothing inside an LLM that corresponds even a tiny bit to what happens when you are asked "What is 297 x 1345?" or "will the moon be visible at 8pm tonight?"
This is precisely what the mechanistic interpretability work challenges. When you ask "will the moon be visible tonight," the model demonstrably activates internal features corresponding to: time, celestial mechanics, geographic location, lunar phases, etc. It combines these representations to generate an answer.
>But when they do, we have an understanding of how that is happening that does not require any sense of the LLM engaging in reasoning or manipulating concepts.
Do we? The whole point of the interpretability research is that we don't have a complete understanding. We're discovering that these models build rich internal world models, causal representations, and abstract features that weren't explicitly programmed. If your claim is "we can in principle reduce it to matrix multiplications," sure, but we can in principle reduce human cognition to neuronal firing patterns too.
>They are not, I believe, similar to what is going on in the brain when you are asked "do you think this poem would have been better if it was a haiku?" and whatever that thing is, that is what I mean by manipulating concepts.
Here's my core objection: you're defining "manipulating concepts" as "whatever special thing happens during conscious human reasoning that feels different from 'pattern matching.'" But this is circular and unfalsifiable. How would we ever know if an LLM (or another human, for that matter) is doing this "special thing"? You've defined it purely in terms of subjective experience rather than functional or mechanistic criteria.
>Humans are likely just maximizing genetic fitness... but that describes, as you note, a goal not a mechanism. Along the way, they manifest huge numbers of sub-goal directed behaviors... that are, broadly speaking, not governed by the top level goal. LLMs don't do this.
LLMs absolutely do this, it's exactly what the interpretability research reveals. LLMs trained on "token prediction" develop huge numbers of sub-goal directed internal behaviors (spatial reasoning, causal modeling, logical inference) that are instrumentally useful but not explicitly specified, precisely the phenomenon you claim only humans exhibit. And 'token prediction' is not about text. The most significant advances in robotics in decades are off the back of LLM transformers. 'Token prediction' is just the goal, and I'm tired of saying this for the thousandth time.
> Here's my core objection: you're defining "manipulating concepts" as "whatever special thing happens during conscious human reasoning that feels different from 'pattern matching.'" But this is circular and unfalsifiable. How would we ever know if an LLM (or another human, for that matter) is doing this "special thing"? You've defined it purely in terms of subjective experience rather than functional or mechanistic criteria.
I think your core objection is well aligned to my own POV. I am not claiming that the subjective experience is the critical element here, but I am claiming that whatever is going on when we have the subjective experience of "reasoning" is likely to be different (or more specifically, more usefully described in different ways) than what is happening in LLMs and our minds when doing something else.
How would we ever know? Well the obvious answer is more research into what is happening in human brains when we reason and comparing that to brain behavior at other times.
I don't think it's likely to be productive to continue this exchange on HN, but if you would like to continue, my email address is in my profile.
I'm convinced of this. I think it's because we've always looked at the most advanced forms of human languaging (like philosophy) to understand ourselves. But human language must have evolved from forms of communication found in other species, especially highly intelligent ones. It's to be expected that the building blocks of it is based on things like imitation, playful variation, pattern-matching, harnessing capabilities brains have been developing long before language, only now in the emerging world of sounds, calls, vocalizations.
Ironically, the other crucial ingredient for AGI which LLMs don't have, but we do, is exactly that animal nature which we always try to shove under the rug, over-attributing our success to the stochastic parrot part of us, and ignoring the gut instinct, the intuitive, spontaneous insight into things which a lot of the great scientists and artists of the past have talked about.
Are you familiar with the major works in epistemology that were written, even before the 20th century, on this exact topic?
That's a bar that only a handful of humans have met in the entirety of history.
The only people who are serious about suggesting that LLMs could possibly 'think' are the people who are committing fraud on the scale of hundreds of billions of dollars (good for them on finding the all-time grift!) and people who don't understand how they're programmed, and thusly are the target of the grift. Granted, given that the vast majority of humanity are not programmers, and even fewer are programmers educated on the intricacies of ML, the grift target pool numbers in the billions.
Could you elucidate me on the process of human thought, and point out the differences between that and a probabilistic prediction engine?
I see this argument all over the place, but "how do humans think" is never described. It is always left as a black box with something magical (presumably a soul or some other metaphysical substance) inside.
1: https://chatgpt.com/share/6965653e-b514-8011-b233-79d8c25d33...
Then, we have a model that knows quite a bit in modern English. We also legally have a data set for everything it knows. Then, there's all kinds of experimentation or copyright-safe training strategies we can do.
Project Gutenberg up to the 1920's seems to be the safest bet on that.
But I also think we can do this with normal LLMs trained on up-to-date text, by asking them to come up with any novel theory that fits the facts. It does not have to be a groundbreaking theory like QM, just original and not (yet) proven wrong ?
But considering that Special Relativity was published in 1905 which means all its building blocks were already floating in the ether by 1900 it would be a very interesting experiment to train something on Claude/Gemini scale and then say give in the field equations and ask it to build a theory around them.
Here you will find all official resources provided by the Turkish state. You can register as a foreign citizen and conduct research. However, knowing Turkish is a great advantage. You can find more accurate documents by searching in the summaries, because all documents are archived with a short Turkish explanation - a summary. Also, most scanned images you want to download are paid.
1900 era humans are not available to do this so I'm not sure how this experiment is supposed to work.
Special relativity however seems possible.
It's a relief to think that we're not trusting these things for stuff like financial advice, medical advice, mental health counselling, ...
Lately I've read a few older biographies/autobiographies - Thomas Scott's autobiography (The Force of Truth), Halyburton's autobiography, and James Henley Thornwell and Benjamin Morgan Palmer biographies.
Right now I'm reading the Life and Times of Jesus Messiah by Alfred Edersheim (19th century).
How about you?
The part of the lord's prayer that says "our father who art in heaven" is kinda like this - father is linked to a second person conjugation. You could remove some words and make it into "father art in heaven", which you claim is ungrammatical. I'm skeptical that it was.
Conceivably it’s grammatical if Henry is vocative and the pronoun is dropped colloquially, like “Who art [thou], O Henry?” but it’s a stretch.
Today, even ignoring the dated conjugation, "who art in heaven" or "who are in heaven", does not make sense. We would switch it into the third person.
(Well, not 19th century...)
"Art thou" should be translated into modern English as "are you to be", and so works better with things (what are you going to be), or people who are alive, and have a future (who are you going to be?).
Those are probably the contexts you are thinking of.
So yes you can interpret it as "for what reason or why are you going to be" (this thing she will now explain).
I saw TimeCapsuleLLM a few months ago, and I'm a big fan of the concept but I feel like the execution really isn't that great. I wish you:
- Released the full, actual dataset (untokenized, why did you pretokenize the small dataset release?)
- Created a reproducible run script so I can try it out myself
- Actually did data curation to remove artifacts in your dataset
- Post-trained the model so it could have some amount of chat-ability
- Released a web demo so that we could try it out (the model is tiny! Easily can run in the web browser without a server)
I may sit down and roll a better iteration myself.
If you could only give it texts and info and concepts up to Year X, well before Discovery Y, could we then see if it could prompt its way to that discovery?
You'd have to be specific what you mean by AGI: all three letters mean a different thing to different people, and sometimes use the whole means something not present in the letters.
> If you could only give it texts and info and concepts up to Year X, well before Discovery Y, could we then see if it could prompt its way to that discovery?
To a limited degree.
Some developments can come from combining existing ideas and seeing what they imply.
Other things, like everything to do with relativity and quantum mechanics, would have required experiments. I don't think any of the relevant experiments had been done prior to this cut-off date, but I'm not absolutely sure of that.
You might be able to get such an LLM to develop all the maths and geometry for general relativity, and yet find the AI still tells you that the perihelion shift of Mercury is a sign of the planet Vulcan rather than of a curved spacetime: https://en.wikipedia.org/wiki/Vulcan_(hypothetical_planet)
Well, they obviously can't. AGI is not science, it's religion. It has all the trappings of religion: prophets, sacred texts, origin myth, end-of-days myth and most importantly, a means to escape death. Science? Well, the only measure to "general intelligence" would be to compare to the only one which is the human one but we have absolutely no means by which to describe it. We do not know where to start. This is why you scrape the surface of any AGI definition you only find circular definitions.
And no, the "brain is a computer" is not a scientific description, it's a metaphor.
Disagree. A brain is turing complete, no? Isn't that the definition of a computer? Sure, it may be reductive to say "the brain is just a computer".
https://aeon.co/essays/your-brain-does-not-process-informati...
This article seems really hung up on the distinction between digital and analog. It's an important distinction, but glosses over the fact that digital computers are a subset of analog computers. Electrical signals are inherently analog.
This maps somewhat neatly to human cognition. I can take a stream of bits, perform math on it, and output a transformed stream of bits. That is a digital operation. The underlying biological processes involved are a pile of complex probabilistic+analog signaling, true. But in a computer, the underlying processes are also probabilistic and analog. We have designed our electronics to shove those parts down to the lowest possible level so they can be abstracted away, and so the degree to which they influence computation is certainly lower than in the human brain. But I think an effective argument that brains are not computers is going to have to dive in to why that gap matters.
You should look into the physical church turning thesis. If it's false (all known tested physics suggests it's true) then well we're probably living in a dualist universe. This means something outside of material reality (souls? hypercomputation via quantum gravity? weird physics? magic?) somehow influences our cognition.
> Turning complete does not apply to the brain
As far as we know, any physically realizable process can be simulated by a turing machine. And FYI brains do not exist outside of physical reality.. as far as we know. If you have issue with this formulation, go ahead and disprove the physical church turning thesis.
This is odd because the dominant view in neuroscience is that memories are stored by altering synaptic connection strength in a large number of neurons. So it's not clear what his disagreement is, and he just seems to be misrepresenting neuroscientists.
Interestingly, this is also how LLMs store memory during training: by altering the strength of connections between many artificial neurons.
> But here is what we are not born with: information, data, rules, software, knowledge, lexicons, representations, algorithms, programs, models, memories, images, processors, subroutines, encoders, decoders, symbols, or buffers – design elements that allow digital computers to behave somewhat intelligently. Not only are we not born with such things, we also don’t develop them – ever.
Really? Humans don't ever develop memories? Humans don't gain information?
I have trouble comprehending this. What is "computer" to you?
There's certainly plenty of cargo-culting right now on AI.
Sacred texts, I don't recognise. Yudkowsky's writings? He suggests wearing clown shoes to avoid getting a cult of personality disconnected from the quality of the arguments, if anyone finds his works sacred, they've fundamentally misunderstood him:
I have sometimes thought that all professional lectures on rationality should be delivered while wearing a clown suit, to prevent the audience from confusing seriousness with solemnity.
- https://en.wikiquote.org/wiki/Eliezer_YudkowskyProphets forecasting the end-of-days, yes, but this too from climate science, from everyone who was preparing for a pandemic before covid and is still trying to prepare for the next one because the wet markets are still around, from economists trying to forecast growth or collapse and what will change any given prediction of the latter into the former, and from the military forces of the world saying which weapon systems they want to buy. It does not make a religion.
A means to escape death, you can have. But it's on a continuum with life extension and anti-aging medicine, which itself is on a continuum with all other medical interventions. To quote myself:
Taking a living human's heart out without killing them, and replacing it with one you got out a corpse, that isn't the magic of necromancy, neither is it a prayer or ritual to Sekhmet, it's just transplant surgery.
…
Immunity to smallpox isn't a prayer to the Hindu goddess Shitala (of many things but most directly linked with smallpox), and it isn't magic herbs or crystals, it's just vaccines.
- https://benwheatley.github.io/blog/2025/06/22-13.21.36.htmlWays data might leak to the model that come to mind: misfiled/mislabled documents, footnotes, annotations, document metadata.
Especially for London there is a huge chunk of recorded parliament debates.
More interesting for dialoge seems training on recorded correspondence in form of letters anyway.
And that corpus script just looks odd to say the least, just oversample by X?
SOTA LLMs show quite a lot of skill, but they only do so after reading a significant fraction of all published writing (and perhaps images and videos, I'm not sure) across all languages, in a world whose population is 5 times higher than the link's cut off date, and the global literacy went from 20% to about 90% since then.
Computers can only make up for this by being really really fast: what would take a human a million or so years to read, a server room can pump through a model's training stage in a matter of months.
When the data isn't there, reading what it does have really quickly isn't enough.
As a thought experiment I find it thrilling.
The fact that tech leaders espouse the brilliance of LLMs and don't use this specific test method is infuriating to me. It is deeply unfortunate that there is little transparency or standardization of the datasets available for training/fine tuning.
Having this be advertised will make more interesting and informative benchmarks. OEM models that are always "breaking" the benchmarks are doing so with improved datasets as well as improved methods. Without holding the datasets fixed, progress on benchmarks are very suspect IMO.
LLMs have neither intelligence nor problem-solving abillity (and I won't be relaxing the definition of either so that some AI bro can pretend a glorified chatbot is sentient)
You would, at best, be demonstrating that the sharing of knowledge across multiple disciplines and nations (which is a relatively new concept - at least at the scale of something like the internet) leads to novel ideas.
I think there's a big difference between discoveries through AI-human synergy and discoveries through AI working in isolation.
It probably will be true soon (if it isn't already) that most innovation features some degree of AI input, but still with a human to steer the AI in the right direction.
I think an AI being able to discover something genuinely new all by itself, without any human steering, is a lot further off.
If AIs start producing significant quantities of genuine and useful innovation with minimal human input, maybe the singularitarians are about to be proven right.
So if it can work with whats now past, it will be able to work with the past in the future?
If the prediction is that AI will be able to invent the future. If we give it data from our past without knowledge of the present... what type of future will it invent, what progress will it make, if any at all? And not just having the idea, but how to implement the idea in a way that actually works with the technology of the day, and can build on those things over time.
For example, would AI with 1850 data have figured out the idea of lift to make an airplane and taught us how to make working flying machines and progress them to the jets we have today, or something better? It wouldn't even be starting from 0, so this would be a generous example, as da Vinci way playing with these ideas in the 15th century.
If it can't do it, or what it produces is worse than what humans have done, we shouldn't leave it to AI alone to invent our actual future. Which would mean reevaluating the role these "thought leaders" say it will play, and how we're educating and communicating about AI to the younger generations.
This is your assertion made without any supportive data or sources. It's nice to know your subjective opinion on the issue but your voice doesn't hold much weight making such a bold assertion devoid of any evidence/data.
LLMs trained on data up to a strictly constrained point are our best vehicle to have a view (however biased) on something, detached from its origins and escape a local minima. The speculation is that such LLMs could help us look at correlational links accepted as truths and help us devise an alternative experimental path or craft arguments for such experiments.
Imagine you have an LLM trained on papers up to some threshold, feed your manuscript with correlational evidence and have an LLM point out uncontrolled confounders or something like that.
Do you have any resources that back up such a big claim?
> relatively small mount of focus & critical thinking on the issue of how LLMs & other categories of “AI” work.
I don't understand this line of thought. Why wouldn't the ability to recognize patterns in existing literature or scientific publications result in potential new understandings? What critical thinking am I not doing?
> postulate “new” scientific ideas
What are you examples of "new" ideas that aren't based on existing ones?
When you say "other categories of AI", you're not including AlphaFold, are you?
I think you might be confusing creation ex nihilo with combinatorial synthesis which LLMs excel at. The proposed scenario is a fantastic testcase for exactly this. This doesn't cover verification of course but that's not the question here. The question is wether an already known valid postulate can be synthesized.
PROMPT:Charles Darwin
Charles DarwinECCEMACY. Sir, — The following case is interesting to me : — I was in London a fortnight, and was much affected with an attack of rheumatism. The first attack of rheumatism was a week before I saw you, and the second when I saw you, and the third when I saw you, and the third in the same time. The second attack of gout, however, was not accompanied by any febrile symptoms, but was accompanied with an increased flow of urine, and with a more copious discharge of urine. The third attack was an hour after I saw you, and was succeeded by a return of a paroxysm of gout, and a more rapid return of the gout. The fourth attack was also accompanied by a fever, but was not always accompanied by any febrile symptoms. The third attack of gout was a fortnight after you had been ill, and the fourth was followed by a paroxysm of gout. The fourth attack was a fortnight after you were attacked, and was accompanied by a sense
It would be interesting to know how much text was generated per century!
Very cool concept though, but it definitely has some bias.
So, the OP model will accidentally reflect the biases of the time. The current, commercial models intentionally reflect specific biases. Except for uncensored models which accidentally have those in the training data modified by uncensoring set.
to be frank though, I think this a better way than all people's thoughts all of the time.
I think the "crowd" of information makes the end output of an LLM worse rather than better. Specifically in our inability to know really what kind of Bias we're dealing with.
Currently to me it feels really muddy knowing how information is biased, beyond just the hallucination and factual incosistencies.
But as far as I can tell, "correctness of the content aside", sometimes frontier LLMs respond like freshman college students, other times they respond with the rigor of a mathematics PHD canidate, and sometimes like a marketing hit piece.
This dataset has a consistency which I think is actually a really useful feature. I agree that having many perspectives in the dataset is good, but as an end user being able to rely on some level of consistency with an AI model is something I really think is missing.
Maybe more succinctly I want frontier LLM's to have a known and specific response style and bias which I can rely on, because there already is a lot of noise.
Another thought, just occurred when thinking about readmes and coding LLMs: obviously this model wouldn't have any coding knowledge, but I wonder if it could be possible to combine this somehow with a modern LLM in such a way that it does have coding knowledge, but it renders out all the text in the style / knowledge level of the 1800's model.
Offhand I can't think of a non-fine-tuning trick that would achieve this. I'm thinking back to how the old style transfer models used to work, where they would swap layers between models to get different stylistic effects applied. I don't know if that's doable with an LLM.
I still think it's super important. Archive your current models - they'll be great in the future.
But on various data (i.e., separate model per source): the Bible, Don Quixote and Franz Kafka. (As well as a (bad!) lyrics generator, and translator.)
The model that could come up with the cure based on the limited data of the time wouldn't just impress, it would demonstrate genuine emergent reasoning beyond pattern matching. The challenge isn't recombining existing knowledge (which LLMs excel at), but making conceptual leaps that require something else. Food for thought.
===
You:
I pray you, who is this Master Newton?
timecapsulellm-v2-1800-1875-mlx:
TI offer to pay you the very same fee as you did before. It was not in the power of your master to deliver the letter to your master. He did. I will be with you as soon as I can keep my word. It is not at all clear, whether the letter has been sent or not. It is not at all clear: but it is clear also that it was written by the person who gave it. "No," I said, "I cannot give it to you." There, the letter was sent to me. "The letter is yours, I believe," I said. "But, I hope, you will not refuse to give it to me?
There's a "Use this model" button on that page that can launch it in LM Studio.
This feels like a neat sci-fi short story hook to explain the continuous emergence of God as an artifact of a simulation
Back in the pre-LLM days, someone trained a Markov chain off the King James Bible and a programming book: https://www.tumblr.com/kingjamesprogramming
I'd love to see an LLM equivalent, but I don't think that's enough data to train from scratch. Could a LoRA or similar be used in a way to get speech style to strictly follow a few megabytes worth of training data?
Somewhat missing the cutoff of 1875 is the revised NT of the KJV. Work on it started in 1870 but likely wasn't used widely before 1881.
Edit: I figured it out
"The Lord of the Rings uding the army under the command of his brother, the Duke of York, and the Duke of Richmond, who fell in the battle on the 7th of April, 1794. The Duke of Ormond had been appointed to the command of the siege of St. Mark's, and had received the victory of the Rings, and was thus commanded to move with his army to the relief of Shenham. The Duke of Ormond was at length despatched to oppose them, and the Duke of Ormond was ordered
There's a disconnect somewhere that I can't quite put my finger on. Am I just lacking imagination?
To get Relativity, the model needs to realize the training data isn't just incomplete, but fundamentally wrong. That requires abductive reasoning (the spark of genius) to jump out of the local minimum. Without that AGI-level spark, a 'pure knowledge pile' will just generate a very eloquent, mathematically rigorous defense of Newtonian physics.
https://news.ycombinator.com/item?id=46319826
I found it particularly thought-inspiring how a model with training from that time period completely lacks context/understanding of what it is itself, but then I realized that we are the same (at least for now).
Now if I train a foundation models with docs from library of Alexandria and only those texts of that period, I would have a chance to get a rudimentary insight on what the world was like at that time.
And maybe time shift further more.
Congratulations, you've reinvented the history book (just with more energy consumption and less guarantee of accuracy)
This isn't super relevant to us because very few histories from this era survived, but presumably there was sufficient material in the Library of Alexandria to cover events from multiple angles and "zero out" the different personal/political/religious biases coloring the individual accounts.
With the internet and pervasive text communication and audio video recording we have the unique ability to make an LLM mimic daily life but I doubt that would be possible for those days.
That happened in the past, and the "naive" way of doing it is usually easy to spot. There are, however, many ways in which testing data can leak into models, even without data contamination. However this doesn't matter much, as any model that only does well in benchmarks but is bad in real-world usage will be quickly sussed out by people actually using them. There are also lots and lots of weird, not very popular benchmarks out there, and the outliers are quickly identified.
> perform on SWE-bench if trained only on data prior to 2024.
There's a benchmark called swe-REbench, that takes issues from real-world repos, published ~ monthly. They perform tests and you can select the period and check their performance. This is fool-proof for open models, but a bit unknown for API-based models.
That is, the architectures of today are chosen to yield the best results given the textual data around today and the problems we want to solve today.
I'd argue that this lack of bias would need to be researched (if it hasn't been already) before this kind of model has credence.
LLMs aren't my area of expertise but during my PhD we were able to encode a lot of a priori knowledge through the design of neural network architectures.
v0.5 123M Parameters
v1: 700M Parameters
v2mini-eval1: 300M Parameters
I would not call this LLM. This is not large. It's just a normal-sized LM. Or even small.
(It's also not a small LLM.)
Having said that, looking up small language model these days returns tons of results calling 7B models small language models.
------
My understanding of small language models is that they're generally intended for specific purposes, like analysis and classification (whatever you'd call the text equivalent of image interrogation with clip models), translation, etc; that there small because they don't need to be big to do their intended functions, not because they're just smaller versions of bigger models.
I think talking to a legit trained LLM from a different era would be rad. But… this seems the opposite of Gemini making black lady popes and Native American Nazis… that these views wouldn’t really be “allowed” (published by anyone that wants AI funding money).
A truly authentic historical model will have some unsavory opinions and very distinctive dialect.