What bothers me about the hot takes is the claim that “all models do is hallucinate.” That collapses the distinction entirely. Yes, models are just predicting the next token—but that doesn’t mean all outputs are hallucinations. If that were true, it’d be pointless to even have the term, and it would ignore the fact that some models hallucinate much less than others because of scale, training, and fine-tuning.
That’s why a careful definition matters: not every generation is a hallucination, and having good definitions let us talk about the real differences.
That is a problem for "Open"AI because they want to sell their products, and because they want to claim that LLMs will scale to superintelligence. Not for others.
"Bad" hallucinations come in different forms, and what the article describes is one of them. Not all of them come from complete uncertainty. There are also the cases where the LLM is hallucinating functions in a library, or they reverse cause and effect when summarising a complex article. Stuff like this still happen all the time, even with SOTA models. They do not happen because the model is bad with uncertainty, they have nothing to do with knowledge uncertainty. Esp stuff like producing statements that misinterpret causal relationships within text, imo, reveals exactly the limits of the architectural approach.
What we perceive as "not hallucination" is merely a very big consensus supported by education, culture, personal beliefs and varies quite a bit. And little in the existence of the model gives it the tools to make those distinctions. Quite the opposite
- From the perspective of LLM research/engineering, saying all LLM generation is hallucination is not particularly useful. It’s meaningless for the problem space.
- From the perspective of AI research/engineering in general (not LLM specific) it can be useful to consider architectures that do not rely on hallucination in the second sense.
'Everything an LLM outputs is a hallucination. It's just that some of those hallucinations are true.'
So instead of being that pedantic, we decided that "hallucination" only applies to when what our brain thinks we see does not match reality, so now hallucination is actually a useful word to use. Equally with LLMs, when people talk about hallucinations part of the definition includes that the output be incorrect in some way. If you just go with your quote's way of thinking about it, then once again the word loses all purpose and we can just scrap it since it now means exactly the same thing as "all LLM output".
Except it's not. People can have hallucinations that are true (dreams), but most perception isn't generated by your brain, but comes from the outside.
We need to establish proper definitions and models for these things before we can begin to argue about them. Otherwise we're just wasting time.
You need the ground truth to be able to make that determination, so using your knowledge does count. If you press the model to answer even when it does not know, you get confabulation. What today's models lack is the ability to measure their confidence, so they know when to abstain.
The reification of counterfactual outputs which are otherwise indistinguishable from the remainder of LLM production etiologically is a better candidate for the label "hallucination" IMO.
To me, this seems to be an "US-American" way of thinking about multiple-choice tests. Other common ways to grade multiple-choice test that I have seen commonly are:
1. If the testee has the information that exactly one of N given choices is correct:
1.1 Give N-1 points for the correct answer, and -1 [negative one] point(s) for a wrong answer. This way, if the testee just answers the questions randomly, he will as expected value score 0 points.
1.2 A more brutal way if N>=3: the correct answer gives 1 point, all wrong answers give -1 points. You should learn your lesson only to give an answer if it is [alliteration unintended :-) ] correct (if N=2, the grading is identical to 1.1).
2. If there are possibly multiple correct answers, turn each item into choices of "yes" or "no" (with the option to give no answer). The correct choice gives you 1 point, the wrong gives you -1 point (i.e. as in 1.1).
A lot of what they do is based on public relations rather than psychometric validity.
For TIMED multiple-choice tests (and the timed constraint makes sense in OP analogy as well), probabilistic answering is the kryptonite that lets smart people do well on SATs and IQ tests and other things like that.
I took an IQ test recently and it all came rushing back to me.
For math problems, often the right answer can be found just by inspecting the ones digit of the possible answers and process of elimination. Others, by abstracting what errors the test writer is expecting you to make, and eliminating those as possible answers. It's like magic. Sure, you could actually sit and SOLVE each problem, but when spend the time, when time is valuable?
Pretty sure these types of strategies are not actively taught to anyone unless you have a good college counselor /interested teacher/ SAT tutor. But perhaps they ought to be.
> This idea is not new. Some standardized tests have long used versions of negative marking for wrong answers or partial credit for leaving questions blank to discourage blind guessing.
But better evals are still helpful, because they reward LLM vendors for trying to do the very-hard-to-do thing. Instead of rewarding them for training an LLM that's really good at emitting 7% confidence guesses.
And OpenAI has induced hallucinations in o3 with RLVR mistakes, not with a failed pre-training run. They used o4-mini as an example - similar training to o3 and similar issues.
Conversely, they have also designed a post-training system that has successfully reduced hallucinations in GPT-5.
LLMs hallucinate because they are language models. They are stochastic models of language. They model language, not truth.
If the “truthy” responses are common in their training set for a given prompt, you might be more likely to get something useful as output. Feels like we fell into that idea and said - ok this is useful as an information retrieval tool. And now we use RL to reinforce that useful behaviour. But still, it’s a (biased) language model.
I don’t think that’s how humans work. There’s more to it. We need a model of language, but it’s not sufficient to explain our mental mechanisms. We have other ways of thinking than generating language fragments.
Trying to eliminate cases where a stochastic model the size of an LLM gives “undesirable” or “untrue” responses seems rather odd.
That means there would be some high dimensional surface representing "all true things". Any fact could be trivially resolved as "true" or "false" simply by exploring whether or not it was represented on this surface. Where or not "My social security number is 123-45-6789" is true could be determined simply by checking whether or not that statement was mappable to the truth manifold. Likewise you could wander around that truth manifold and start generating output of all true things.
If such a thing existed it would make even the wildest fantasies about AGI seem tame.
edit: To simplify it further, this would imply you could have an 'is_true(statement: string): bool' function for any arbitrary statement in English.
Frankly, this is a silly line of argument. There is a vast spectrum between regularly inventing non-existent citations and total omniscience. "We can't define objective truth" isn't a gotcha, it's just irrelevant.
Nobody in the field is talking about or working on completely eliminating hallucinations in some grand philosophical sense, they're just grinding away at making the error rate go down, because that makes models more useful. As shown in this article, relatively simple changes can have a huge effect and meaningful progress is being made very rapidly.
We've been here before, with scepticism about Wikipedia. A generation of teachers taught their students "you can't trust Wikipedia, because anyone can edit it". Two decades and a raft of studies later, it became clear that Wikipedia is at least as factually accurate as traditional encyclopedias and textbooks. The contemporary debate about the reliability of Wikipedia is now fundamentally the same as arguments about the reliability of any carefully-edited resource, revolving around subtle and insidious biases rather than blatant falsehoods.
Large neural networks do not have to be omniscient to be demonstrably more reliable than all other sources of knowledge, they just need to keep improving at their current rate for a few more years. Theoretical nitpicking is missing the forest for the trees - what we can empirically observe about the progress in AI development should have us bracing ourselves for radical social and economic transformation.
what is an error? how does the llm "know"?
wikipedia example is good. i'd say its "truth" is based on human curated consensus. everyone gets that. what i don't get what's the llm analog? as you state, it's just about making the error rate go down, ok so what is an error? does it require human in the loop?
There's still no justification for the whole investment craze in LLMs.
LLMs are text generators that are very good at writing a book report based on a prompt and the patterns learned from the training corpus, but it's an entirely separate problem to go through that book report statement by statement and determine if each one is true/false/unknown. And that problem is one that the AI field has already spent 60 years on, so there's a lot of hubris in assuming you can just solve that and bolt it onto the side of GPT-5 by next quarter.
I hope you don't think that the solutions will be a closed-form expression. The solution should involve exploration and learning. The things that LLMs are instrumental in, you know.
Learning to guess the next token is very different from learning to map text to a hypervector representing a graph of concepts. This can be witnessed in image classification tasks involving overlapping objects where the output must describe their relative positioning. Vector-symbolic models perform substantially better than more "brute-force" neural nets of equivalent size.
But this is still different from hardcoding a knowledge graph or using closed-form expressions.
Human intelligence relies on very similar neural structures to those we use for movement. Reference frames are both how we navigate the world and also how we think. There's no reason to limit ourselves to next token prediction. It works great because it's easy to set up with the training data we have, but it's otherwise a very "dumb" way to go about it.
I’m only “thinking in language” when I’m practicing compressing my intent into a shareable format. I don’t think about the majority of highly complex interactions I have with the physical world throughout the day.
As a child did you need to be able to explain in language how the physics of a swing works to be able to use it? Did other kids have to explain it to you in detailed language for you to pick up on how to move your body to do complex tasks?
No. In fact exactly because our compression and decompression of language is even more limited as children, we rely more heavily on raw observation and mimicry of actions occurring in reality itself.
The very idea that a language model can recreate everything we do from the lossy and compressed languages we use to share limited descriptions of much more complex intentions and actions is fundamentally flawed and oversimplified.
E.g. when I explain a concept, what comes to my mind is not a string of letters and words. There is a mix of imagery and even sounds that I may have acquired from learning about a concept - then I translate that into text so it can be communicated.
Theres a reason why people use native subtitles when watching netflix - text complements imagery and sounds.
I don't like this; I find my eyes spending more time than I'd like on the text, and not enough on the visual imagery on the rest of the screen. If I truly wanted more text, I'd just read a book.
Also I watch of English language material that uses accents quite different from what my ears are tuned to.
People watch Netflix to switch their brain off - having the text there helps along with the visual and sound to deliver the content. However, text is inferior to both visual and sound as a delivery mechanism.
Every time this comes up I have to bring up Deutsch. He has the best description of intelligent cognition that I've come across. He takes Popper's "conjecture and criticism" approach to science and argues that this guess-and-check loop applies to all our thinking.
E.g. understanding spoken language has some elements of guessing what might have been said and checking that against the sounds we heard. Visual processing has similar analogies.
LLMs seem to be great at conjecturing stuff, but seem incapable of checking or even knowing they need to check.
Would you have a reference?
Why? It seems no less odd than eliminating cases where it gives "undesirable" code snippets with hallucinated errors. This is very important and not odd at all.
> Trying to eliminate cases where a stochastic model the size of an LLM gives “undesirable” or “untrue” responses seems rather odd.
Take it back to what it is like you say, this is a predictive model, and the work of any ML scientist is to iterate on the model to try and get perfect accuracy on unseen data. It makes sense to want to tune the models to lower the rate of predictive errors. And because perfect predictive accuracy is rarely possible, you need to make judgment calls between precision and recall, which, in the case of LLMs, directly affects how often the model will hallucinate versus how often it will stay silent or overly cautious.
I just mean that, if you're an ML scientist team, you don't just go, we got 76% accuracy, let's close shop, mail in your resignation, job over.
From that angle, it's not odd at all that the team just continues working and now see if they can achieve greater than 76%.
1. If I tell it the first two lines of a story, I want the LLM to complete the story. This requires hallucination, because it has to make up things. The story has to be original.
2. If I ask it a question, I want it to reply with facts. It should not make up stuff.
LMs were originally designed for (1) because researchers thought that (2) was out of reach. But it turned out that, without any fundamental changes, LMs could do a little bit of (2) and since that discovery things have improved but not to the point that hallucination disappeared or was under control.
LLMs predict the likely tokens to follow the context. And they can make incorrect predictions.
LLMs therefore don't have perfect accuracy of prediction. When their predictions are incorrect, people say they "hallucinate".
Nobody questions why predictive weather models aren't perfectly accurate, because it makes sense that a prediction can be wrong.
Marketing and hype has tried to sell LLMs as "logical rational thinkers" equal to human thinking. A human doing actual thinking knows when they are making stuff up. So if a human truly believes obviously false things to be true, it tends to be because they are hallucinating. Their thinking isn't wrong, they've lost track of reality to ground their thinking.
We've anthropomorphized LLMs to the point we wonder why are they hallucinating like we can offer a diagnostic. But if you stop anthropomorphising them and go back to their actual nature as a predictive model, then it's not even a surprising outcome that predictions can turn out to be wrong.
A language model is made to predict language, but used to generate code or answers to math questions, that is not the same situation as a weather model. The language model is not made to solve math or generate correct code, if you ask it to predict the weather it wont try to predict the weather, it will just predict the language that is a probable to such a question.
This sort of misunderstanding is what is causing all these debates, many people really struggle understanding what these language models really are.
But the training does not just reinforce plausible continuations, it biases toward text that matches correct answers. So in that sense they are training it not just to predict any likely text, but to predict text that is more likely to contain the right answer to a math or coding problem.
To me that does not look so different from other ML models. They all work by turning a problem into something a computer can handle statistically, and they all face the same trade offs. Prediction errors are inevitable, and you still have to decide whether to tune for recall, which gives hallucinations, or precision, which gives refusals.
<pedantry>Isn't a language model made to predict the next token in a series, which just so happens to be good for predicting not only natural languages, but also formal ones (code and math)?</pedantry>
Also, similar to what nelox said, as long as language (or sequences of tokens or what have you) can be "about" something (whatever that means), then it's possible that LLMs are encoding information about that "something". I'm being deliberately vague because I think that trying to be precise (by e.g. referring to latent spaces and so on) makes it sound like we've figured something out when in reality we haven't even found the right words to ask the questions.
Have the LLM talk about what “truth” is and the nature of LLM hallucinations and it can cook up an explanation that demonstrates it completely understands the concepts.
Additionally when the LLM responds MOST of the answers are true even though quite a bit are wrong. If it had no conceptual understanding of truth than the majority of its answers would be wrong because there are overwhelmingly far more wrong responses than there are true responses. Even a “close” hallucination has a low probability of occurring due to its proximity to a low probability region of truth in the vectorized space.
You’ve been having trouble conveying these ideas to relatives because it’s an inaccurate characterization of phenomena we don’t understand. We do not categorically fully understand what’s going on with LLMs internally and we already have tons of people similar to you making claims like this as if it’s verifiable fact.
Your claim here cannot be verified. We do not know if LLMs know the truth and they are lying to us or if they are in actuality hallucinating.
You want proof about why your statement can’t be verified? Because the article the parent commenter is responding to is saying the exact fucking opposite. OpenAI makes an opposing argument and it can go either way because we don’t have definitive proof about either way. The article is saying that LLMs are “guessing” and that it’s an incentive problem that LLMs are inadvertently incentivized to guess and if you incentivize the LLM to not confidently guess and to be more uncertain the outcomes will change to what we expect.
Right? If it’s just an incentive problem it means the LLM does know the difference between truth and uncertainty and that we can coax this knowledge out of the LLM through incentives.
It doesn't need a conceptual understanding of truth - yes, there are far more wrong responses than right ones, but the right ones appear more often in the training data and so the probabilities assigned to the tokens which would make up a "right" one are higher, and thus returned more often.
You're anthropomorphizing in using terms like "lying to us" or "know the truth". Yes, it's theoretically possible I suppose that they've secretly obtained some form of emergent consciousness and also decided to hide that fact, but there's no evidence that makes that seem probable - to start from that premise would be very questionable scientifically.
A lot of people seem to be saying we don't understand what it's doing, but I haven't seen any credible proof that we don't. It looks miraculous to the relatively untrained eye - many things do, but just because I might not understand how something works, it doesn't mean nobody does.
You don't actually know this right? You said what I'm saying is theoretically possible so you're contradicting what you're saying.
>You're anthropomorphizing in using terms like "lying to us" or "know the truth". Yes, it's theoretically possible I suppose that they've secretly obtained some form of emergent consciousness and also decided to hide that fact, but there's no evidence that makes that seem probable - to start from that premise would be very questionable scientifically.
Where did I say it's conscious? You hallucinated here thinking I said something I didn't.
Just because you can lie doesn't mean you're conscious. For example, a sign can lie to you. If the speed limit is 60 but there's a sign that says the speed limit is 100 then the sign is lying. Is the sign conscious? No.
Knowing is a different story though. But think about this carefully. How would we determine whether a "human" knows anything? We only can tell whether a "human" "knows" things based on what it Tells us. Just like an LLM. So based off of what the LLM tells us, it's MORE probable that the LLM "knows" because that's the SAME exact reasoning on how we can tell a human "knows". There's no other way we can determine whether or not an LLM or a human "knows" anything.
So really I'm not anthropomorphizing anything. You're the one that's falling for that trap. Knowing and lying are not unique concepts to conciousness or humanity. These are neutral concepts that exist beyond what it means to be human. When I say something, "knows" or something "lies" I'm saying it from a highly unbiased and netural perspective. It is your bias that causes you to anthropomorphize these concepts with the hallucination that these are human centric concepts.
>A lot of people seem to be saying we don't understand what it's doing, but I haven't seen any credible proof that we don't.
Bro. You're out of touch.
https://www.youtube.com/watch?v=qrvK_KuIeJk&t=284s
Hinton, the godfather of modern AI says we don't understand. It's not people saying we don't understand. It's the generally understanding within academia is: we don't understand LLMs. So you're wrong. You don't know what you're talking about and you're highly misinformed.
Additionally, there is a very large body of academic research that digs into how LLMs seem to understand concepts and truths and, sure enough, examples of us making point edits to models to change the “facts” that they “know”. My favorite of that corpus, though far from the only or most current/advances research , is the Bau Lab’s work: https://rome.baulab.info/
You referenced a work on model interpretability which is essentially the equivalent of putting on MRI or electrodes on the human brain and saying we understand the brain because some portion of it lights up when we show the brain a picture of a cow. There’s lots of work on model interpretability just like how there’s lots of science involving brain scans of the human brain… the problem is none of this gives insight into how the brain or an LLM works.
In terms of understanding LLMs we overall don’t understand what’s going on. It’s not like I didn’t know about attempts to decode what’s going on in these neural networks… I know all about it, but none of it changes the overall sentiment of: we don’t know how LLMs work.
This is fundamentally different from computers. We know how computers work such that we can emulate a computer. But for an LLM we can’t fully control it, we don’t fully understand why it hallucinates, we don’t understand how to fix the hallucination and we definitely cannot emulate an LLM in the same way we do for a computer. It isn’t just that we don’t understand LLMs. It’s that there isn’t anything in the history of human invention that we lack such fundamental understanding of.
Off of that logic, the facts are unequivocally clear: we don’t understand LLMs and your statement is wrong.
But it goes beyond this. I’m not just saying this. This is the accepted general sentiment in academia and you can watch that video of Hinton, the godfather of AI in academia basically saying the exact opposite of your claim here. He literally says we don’t understand LLMs.
The Anthropic papers also cover a lot more subjects (e.g. feature splitting, discussion on use in model moderation, activation penalties) than Bau Lab's, as well--which is great, but maybe not when shared as a targeted intro to interpretability/model editing.
This isn't how LLM works. What an LLM understands has nothing to do with the words they say, it only has to do with what connections they have seen.
If an LLM has only seen a manual but has never seen examples of how the product is used, then it can tell you exactly how to use the product by writing out info from the manual, but if you ask it to do those things then it wont be able to, since it has no examples to go by.
This is the primary misconception most people have and make them over estimate what their LLM can do, no they don't learn by reading instructions they only learn by seeing examples and then doing the same thing. So an LLM talking about truth just comes from it having seen others talk about truth, not from it thinking about truth on its own. This is fundamentally different to how humans think about words.
I know how an LLM works. I've built one. At best we only know surface level stuff like the fact that it involves a feed forward network and is using token prediction.
But the emergent effect of how it an LLM produces an overall statement that reflects high level conceptual understanding is something we don't know.
So your claim of "This isn't how an LLM works" which was said which such confidence is utterly wrong. You don't know how it works, no one does.
There is not necessarily a connection between what an LLM understands and what it says. It’s totally possible to emit text that is logically consistent without understanding. As a trivial example, just quote from a physics textbook.
I’m not saying your premise is necessarily wrong: that LLMs can understand the difference between truth and falsehood. All I’m saying is you can’t infer that from the simple test of talking to an LLM.
This is true, but you could say the same thing about a human too right? There's no way to say there's a connection between what a human says and whether or not a human understands something. Right? We can't do mind reading here.
So how do we determine whether or not a human understands something? Based off of what the human tells us. So I'm just extrapolating that concept to the LLM. It knows things. Does it matter what the underlying mechanism is? If we get LLM output to be perfect in every way but the underlying mechanism is still feed forward networks with token prediction then I would still say it "understands" because that's the EXACT metric we use to determine whether a human "understands" things.
>I’m not saying your premise is necessarily wrong: that LLMs can understand the difference between truth and falsehood. All I’m saying is you can’t infer that from the simple test of talking to an LLM.
Totally understood. And I didn't say that it knew the difference. I was saying basically a different version of what you're saying.
You say: We can't determine if it knows the difference between truth and falsehood. I say: We can't determine if it doesn't know the difference between truth and falsehood.
Neither statement contradicts each other. The parent commenter imo was making a definitive statement in that he claims we know it doesn't understand and I was just contradicting that.
The Symbiocene Horizon: A term suggesting a techno-utopian future state where humanity and technology have merged with ecological systems to achieve a perfect, self-correcting state of equilibrium.
But the people who say everything LLMs do is hallucinate clearly also make that distinction, they just refuse to rename the useful hallucinations.
"How many legs does a dog have if you call his tail a leg? Four. Saying that a tail is a leg doesn't make it a leg." -- Abraham Lincoln
Now granted, we also need to back up those notions with rigorous testing and observation, but those "if a tail is a leg" theoretical is the basis of the reasoning.
> I’m assuming the purpose of this post is to try and reframe the discussion
It's to establish a meaningful and practical definition of "hallucinate" to actually make some progress. If everything is a hallucination as the other comments seem to suggest, then the term is a tautology and is of no use to us.
Yes, we can know whether something is true or false, but this is a system being sold as something useful. If it relies on us knowing whether the output is true or false, there is little point in us asking it a question we clearly already know the answer to.
> It's useful as a term of understanding.
No it isn't. I dare you to try publishing in this field with that definition. Claiming all outputs are hallucinations because it's a probabilistic model tells us nothing of value about what the model is actually doing. By this definition, literally everything a human says is a hallucination as well. It is only valuable to those who wish to believe that LLMs can never do anything useful, which as Hinton says, is really starting to sound like an ego-driven religion at this point. Those that follow it do not publish in top relevant outlets any more, and should not be regarded as an expert on the subject.
> they haven't shown they know how to do so yet. We can avoid it, but LLMs cannot, yet.
This is exactly what they argue in the paper. They discuss the logical means by which humans are able to bypass making false statements by saying "I don't know". A model that responds only with a lookup table and an "I don't know" can never give false statements, but is probably not so useful either. There is a sweet spot here, and humans are likely close to it.
> If it relies on us knowing whether the output is true or false
I never said the system relies on it. I said that our definition of hallucination, and therefore our metrics by which to measure it, depend only on our knowing whether the output is true. This is no different from any other benchmark. They are claiming that it might be useful to make a new benchmark for this concept.
OpenAI has a machine that emits plausible text. They're trying to argue that "emitting plausible text" is the hard problem, and "modeling the natural world, human consciousness, society, etc." is the easy one.
Modelling those things is a separate problem to emitting plausible text and pursuing one is not necessarily beneficial to the other. It seems more sensible to pursue separate models for each of these tasks.
so if you ask, "what is the capital of colorado" and it answers "denver" calling it a Hallucination is nihilistic nonsense that paves over actually stopping to try and understand important dynamics happening in the llm matrices
On the other hand, calling it anything other than a hallucination misrepresents the idea of truth as being something that these models have any ability to differentiate between their outputs based on whether they accurately reflect reality by conflating a fundamentally unsolved problem as an engineering tradeoff.
At the end of the day, the goal is to train models that are able to differentiate between true and false statements, at least to a much better degree than they can now, and the linked article seems to have some very interesting suggestions about how to get them to do that.
I'm a bit surprised no one talks about this factor. It's like talking to a giant narcissist who can Google really fast but not understand what it reads. The ability to admit ignorance is a major factor of credibility, because none of us know everything all at once.
Why would anyone respond with so little nuance?
> a Hallucination
Oh, so your shift key wasn't broken all the time, then why aren't you using it in your sentences?
What is true is that during pretraining, the model doesn’t know enough to determine this or to distinguish between what it knows and what it’s making up. This is a higher-level distinction that emerges later, if at all.
The recent research discovering an “evil vector” is an example of a higher-level distinction.
If I ask the LLM to generate a fictional story set in medieval Francs, and it then responds with a fictional story set in medieval France, that's an appropriate ("correct") response to the task I gave it. If it responded with a story set in medieval England, though, that would not be correct. If, instead, I had asked it to generate a story in "medieval times", both France and England would have been correct as locations because the problem was underspecified and asked for some creativity. A medieval story set in the US, however, would still not have been correct or consistent with the training data. You can come up with more such examples even in entirely fictional settings: Once the story has been set to take place in fictional city X, it would not be consistent if two sentences later the characters were in city Y all of a sudden. (That would be a bit too creative.) What I'm trying to say is: Creativity might be "correct" (appropriate) in a given context, or it might not be. Even fiction and creativity require a certain degree of consistency and coherence.
Now, correct answers, in turn, might also require a certain degree of creativity:
If I ask the LLM for some straight up facts, which are not in its training data nor in the prompt context, the only really correct answer is "I don't know". However, sometimes it might be possible to narrow down the correct answer to a few possible options based on the training data. So then it might be appropriate for the LLM to say "I don't know the exact answer but here are some educated guesses based on what I do know: …" And maybe, having pondered those options, it is able to deduce the correct answer after all. (In the same way as I am writing this HN comment to help me think and clarify my thoughts.)
This is reminiscent of mathematics and mathematical research, which are often described as a creative process. Obviously, the creative output is heavily constrained. You make educated guesses and then validate them against what you already know to be true. Someone else here in this thread[0] mentioned Popper's "Conjectures and Refutations" as a possible model for what intelligent cognition is about and the more I think about that, the more convincing I find it.
I mean it’s plain that you have an orthogonal (though generic) opinion on why LLMs hallucinate but how does that relate to the article? How does your opinion which you blatantly just dropped as if it’s the final opinion override the opinion of the article?
Seems off topic honestly.
Is it a hallucination if the story is original? There's a difference between "what's the rest of this famous poem?" and "let's just make poetry".
But even if we restricted ourselves to the case of factual queries, the article discusses why training in a certain way would still produce hallucinations, and how to change the training method to reduce this.
Like many of the other responses here, your dismissal doesn't really address any of the content of the article, just the title.
I’ve not seen anyone intuitively explain parameters for a real scale model.. perhaps because it’s all just thousand dimensional nonsense.
Statistics is a funny thing too. Pretty much everyone has seen how trend lines don’t always extrapolate very well.
I think OpenAI is biased to thinking that adding more parameters and training better will fix all ills. In a handwaving way, you can see this like adding more degrees to the polynomial when you curve fit on a spreadsheet. With enough parameters you can perfectly fit any dataset. That all works until you run across new inputs that are unlike training data.
Their whole existence depends on this happening. Else they go bust.
If "no", then clearly, you can hit general intelligence without that.
And if "yes", then I see no reason why an LLM can't have that knowledge crammed inside it too.
Would it be perfect? Hahahaha no. But I see no reason why "good enough" could not be attained.
There is a sort of knowledge humans possess that LLMs don't (and in fact can't, without a fundamental architectural change), which is knowledge of how certain one is about something.
If you ask a human a question about how something works in biology, they will be able to give you an answer as well as a sort of "epistemic" citation (i.e. the difference between "I don't remember where exactly I originally read that, but I'm a research biologist and am quite certain that's how it works" versus "I don't remember where I read that - it's probably just something we learned about in biology class in high school. Take it with a grain of salt, as I could be misremembering.")
LLMs don't have this reflexive sense of their own knowledge - there's a fundamental divide between training data (their "knowledge") and context (their "memory") which causes them to not really be capable of understanding how they know what they know (or, indeed, whether they truly know it at all). If a model could be created where the context and training data were unified, like in a brain, I could see a more realistic path to general intelligence than what we have now.
You can get an LLM to generate a list of facts that includes hallucinations - and then give that list to another instance of the same LLM, and get it to grade how certain it is of each fact listed. The evaluation wouldn't be perfect, but it'll outperform chance.
You can make that better with the right training. Or much worse, with the wrong training. Getting an LLM to be fully aware of all the limits of its knowledge is likely to be impractical, if not outright impossible, but you can improve this awareness by a lot, and set a conservative baseline for behavior, especially in critical domains.
"Fully aware of all the limits of its knowledge" is unattainable for humans too, so LLMs are in a good company.
The sort of training you're talking about is content like, "ChatGPT was trained on research papers in the area of biology. It possesses knowledge of A, B, and C. It does not possess knowledge of X, Y and Z." But this merely creates the same problem in a loop - given a question, how does the LLM -know- that its training data contains information about whether or not its training data contains information about the answer to the question? The reality is that it doesn't know, you just have to assume that it did not hallucinate that.
The problem of being unaware of these things is not theoretical - anyone with deep knowledge of a subject will tell you that as soon as you go beyond the surface level of a topic, LLMs begin to spout nonsense. I'm only a software engineer, but even I regularly face the phenomenon of getting good answers to basic questions about a technology, but then beyond that starting to get completely made-up features and function names.
> "Fully aware of all the limits of its knowledge" is unattainable for humans too
This just isn't true. Humans know whether they know things, and whether they know how they know it, and whether they know how they know how they know it, and...
Knowledge itself can contain errors, but that's not what I'm talking about. I'm not talking about never being wrong. I'm merely talking about having access to the contents of one's own mind. (Humans can also dynamically update specific contents of their own mind, but that's also not even what I'm talking about right now.) An LLMs hallucination is not just knowledge that turned out to be wrong, it is in fact knowledge that never existed to begin with, but the LLM has no way of telling the difference.
No human has ever managed to read out his connectome without external instrumentation. There were entire human civilizations that thought that the seat of consciousness was the heart - which, for creatures that claim to know how their own minds work, is a baffling error to make.
LLMs are quite similar in that to humans. They, too, have no idea what their hidden size is, or how many weights they have, or how exactly are the extra modalities integrated into them, or whether they're MoE or dense. They're incredibly ignorant of their own neural architecture. And if you press them on it, they'll guess, and they'll often be wrong.
The difference between humans and LLMs comes down to the training data. Humans learn continuously - they remember what they've seen and what they haven't, they try things, they remember the outcomes, and get something of a grasp (and no, it's not anything more than "something of a grasp") of how solid or shaky their capabilities are. LLMs split training and inference in two, and their trial-and-error doesn't extend beyond a context window. So LLMs don't get much of that "awareness of their own capabilities" by default.
So the obvious answer is to train that awareness in. Easier said than done. You need to, essentially, use a training system to evaluate an LLM's knowledge systematically, and then wire the awareness of the discovered limits back into the LLM.
OpenAI has a limited-scope version of this in use for GPT-5 right now.
(To be sure, there are plenty of cases where it is clear that we are only making up stories after the fact about why we said or did something. But sometimes we do actually know and that reconstruction is accurate.)
I call this process "learning"
I've tested this in a wide range of topics across corporate finance, valuation, economics and so on and yes once you go one or two levels deep it starts spouting total nonsense. If you ask it to define terms succintly and simply it cannot. Why? Because the data that been fed into the model is from people who cannot do it themselves lol.
The experts, will remain experts.
Most people I would argue have surface level knowledge so they are easily impressed and don't get it because A) they don't go deep B) They don't know what it means to go thoroughly deep in a subject area.
An LLM, by definition, doesn't have such a concept. It's a model of language, hence "LLM".
Do you think the phrase just means "software"? Why?
Here's a simple test: make up a brand new word, or a brand new person. Then ask a few LLMs what the word means, or when that person was born.
If an LLM had zero operational awareness of its knowledge, it would be unable to recognize that the word/person is unknown to it. It would always generate a plausible-sounding explanation for what the word might mean, the same exact way it does for the word "carrot". Or a plausible-sounding birth date, the way it does for the person "Abraham Lincoln".
In practice, most production grade LLMs would recognize that a word or a person is unknown to them.
This is a very limited and basic version of the desirable "awareness of its own knowledge" - and one that's already present in current LLMs! Clearly, there's room for improved self-awareness.
If you told them to write a Lewis Carroll poem about a nonsense word, it wouldn't have any problem. Not because it "recognizes" the word as being like a nonsense word in a Lewis Carroll poem, but because those poems are filled with other un-tokenizable words that could be replaced with anything.
I'm starting to come to the conclusion that LLMs are Mad-Libs at scale. Which are actually very useful. If there are paragraphs where I can swap out the words for other words, and generate a plausible idea, I can try it out in the real world and it might really work.
The "capability" you see is for the LLM to recognize its a human typed random string since human typed random strings are not very random. If you send it an actual random word then it typically fails.
This makes me wonder something specific.
Let's imagine that we generate poetry "in the style of Lewis Carroll" around a particular nonsense word, one that hasn't been written down before.
Will that poetry treat the word as if it has one consistent pronunciation?
(This question doesn't quite apply to Jabberwocky - Lewis Carroll himself would obviously have passed the test, but he doesn't reuse his nonsense words.)
I will add they will never take over my job <in my lifetime> because it makes me sound more rational and it's easier to swallow that then to swallow the possibility that they will make me irrelevant once the hallucination problem is solved.
This is the same reason that RLVR works. There is just right one answer and LLMs learn this fairly well but not perfectly (yet)
Loss is only correctness in terms of correct language, not correct knowledge. It correlates with correct knowledge, but that is all, that correlation is why LLM is useful for tasks at all but we still don't have a direct measure for correct knowledge in the models.
So for language tasks loss is correctness, so for things like translations LLM are extremely reliable. But for most other kinds of tasks they are just loosely correlated.
If the knowledge can be represented in text then they can learn it, if it can't then we need a multimodal model.
> It’s doubly hard to distinguish valid statements from invalid ones when you don’t have any examples labeled as invalid. But even with labels, some errors are inevitable. To see why, consider a simpler analogy. In image recognition, if millions of cat and dog photos are labeled as “cat” or “dog,” algorithms can learn to classify them reliably. But imagine instead labeling each pet photo by the pet’s birthday. Since birthdays are essentially random, this task would always produce errors, no matter how advanced the algorithm.
> The same principle applies in pretraining. Spelling and parentheses follow consistent patterns, so errors there disappear with scale. But arbitrary low-frequency facts, like a pet’s birthday, cannot be predicted from patterns alone and hence lead to hallucinations. Our analysis explains which kinds of hallucinations should arise from next-word prediction. Ideally, further stages after pretraining should remove them, but this is not fully successful for reasons described in the previous section.
"Why do venture capital funded startups try to turn PR propaganda terms into widely used technical jargon"
Supporting points:
1) LLMs are not intelligence in any form, artificial or otherwise.
2) Hallucination is a phenomenon of a much more complex conscious entity. LLM's are not conscious, and therefore can't hallucinate in any way similar to a conscious entity.
3) Anthropomorphizing inanimate systems is a common phenomenon in human psychology.
Please stop spreading PR propaganda as if it were technical fact.
A reference from today's feed:
https://www.theatlantic.com/podcasts/archive/2025/09/ai-and-...
The model head doesn't hallucinate. The sampler does.
If you ask an LLM when x was born and it doesn't know.
And you take a look at the actual model outputs which is a probability distribution over tokens.
IDK is cleanly represented as a uniform probability Jan 1 to Dec 31
If you ask it to answer a multiple choice question and it doesn't know. It will say this:
25% A, 25% B, 25% C, 25%D.
Which is exactly, and correctly, the "right answer". The model has admitted it doesn't know. It doesn't hallucinate anything.
In reality we need something smarter than a random sampler to actually extract this information out. The knowledge and lack of knowledge is there, you just produced bullshit out of it.
There are questions that have a palpable split in probability between the answers, with logit distribution immediately exposing the underlying lack-of-confidence.
But there are also questions that cause an LLM to produce consistent-but-wrong answers. For example, because the question was associated with another not-the-same-but-somewhat-similar question internally, and that was enough to give an LLM a 93% on B, despite B being the wrong answer.
An LLM might even have some latent awareness of its own uncertainty in this case. But it has, for some reason, decided to proceed with a "best guess" answer, which was in this case wrong.
But unknown-unknowns likely reduce to the Halting problem, which human intelligence doesnt really solve either.
It just happens that a lot of that output is useful/corresponding with the real world.
It does however make the point that hallucinations are not some special glitch which is distinct from the normal operation of the model. It's just outputting plausible text, which is right often enough to be useful.
Adding in some extra sauce to help the model evaluate the correctness of answers, or when it doesn't know enough to give a good answer, is obviously one way to mitigate this otherwise innate behaviour.
To say "it only hallucinates sometimes" is burying the lede and confusing for people who are trying to use it
Q: How do I stop Hallucinations? A: useless question, because you can't. It is the mechanism that gives you what you want
I think that thinking of all LLM output as 'hallucinations' while making use of the fact that these hallucinations are often true for the real world is a good mindset, especially for nontechnical people, who might otherwise not realise.
So these companies cannot do this, they would hemorrhage too many users and companies cannot go against the profit incentives in practice.
And it's easy to damage the hallucination-avoidance capabilities by training an LLM wrong. As OpenAI has demonstrated when they fried the o3 with RLVR that encouraged guesswork.
That "SAT test incentivizes guesswork" example they give in the article is one they had to learn for themselves the hard way.
I asked it to play a word game. This is very simple, and a very short session too. It failed in its very first response, and then it failed in explaining why it failed. All with total confidence, no hesitation.
Nobody fluent in English would fail so catastrophically. I actually expected it to succeed:
https://chatgpt.com/share/68bcb490-a5b4-8013-b2be-35d27962ad...
It's clear by this failure model the LLM doesn't understand anything.
Edit: to be clear, as the session goes longer it becomes more interesting, but you can still trip the LLM up in ways no human "understanding" the game would. My 6-year old plays this game better, because she truly understands... she can trip up, but not like this.
In LLMs that balance shows up as how often the model hallucinates versus how often it says it doesn’t know. If you push toward precision you end up with a model that constantly refuses: What’s the X of Y? I don’t know. Can you implement a function that does K? I don’t know how. What could be the cause of G? I can’t say. As a user that gets old fast, you just want it to try, take a guess, let you be the judge of it.
Benchmarks and leaderboards usually lean toward recall because a model that always gives it a shot creates a better illusion of intelligence, even if some of those shots are wrong. That illusion keeps users engaged, which means more users and more money.
And that's why LLM hallucinates :P
Facebook? "Steal your data"
Google? "Kill your favourite feature"
Apple? "App Store is enemy of the people"
OpenAI? "More like ClosedAI amirite"
They apparently didn't read the article, or didn't understand i, or disregard from it. (Why, why, why?)
And they fail to realize that they don't know what they are talking about, nevertheless keep talking. Similar to an over confident AI.
On a discussion about hallucinating AIs, the humans start hallucinating.
If we (humans) make confident guesses, but are wrong — then, others will look at us disappointedly, thinking "oh s/he doesn't know what s/he is talking about, I'm going to trust them a bit less hereafter". And we'll tend to feel shame and want to withdraw.
That's a pretty strong punishment, for being confidently wrong? Not that odd, then, that humans say "I'm not sure" more often than AIs?
They erroneously construct responses (i.e., confabulation).
LLMs, in a very real way, have "conscientiousness". As in: it's a property that can be measured and affected by training, and also the kind of abstract concept that an LLM can recognize and operate off.
If you can just train an LLM to be "more evil", you can almost certainly train an LLM to be "more conscientious" or "less conscientious".
I like to explain this whole hallucination problem by stating that LLMs are 2 different machines working together. one half of the machine is all the knowledge it was trained on, and you can think of this knowledge as an enormous classic tree you learn in CS classes; and each node in this tree is a token. the other half of the machine is a program that walks through this enormous tree and prints the token it's on
when you think of it like this, 3 things become immediately obvious
1. LLMs are a totally deterministic machine
2. you can make them seem smart by randomizing the walk through the knowledge tree
3. hallucinations are a side effect of trying to randomize the knowledge tree walk
I find it interesting that LLM companies are trying to fix such a fundamental problem by training the model to always guess the correct path. the problem I see with this approach is that 2 people can enter the same input text, but want 2 different outputs. if there isn't always a _correct path_ then you can't really fix the problem.
the only 2 options you have to “improve” things is prune and or add better data to the knowledge tree, or you’re trying the make the program that walks the knowledge tree take better paths.
the prune/add data approach is slightly better because it’s improving the quality of the token output. but the downside is you quickly realize that you need a fire hose of new human data to keep improving - but much of the data out there is starting to be generated by the LLMs - which leads to this inbreeding effect where the model gets worse
the 2nd approach feels less ideal because it will slow down the process of generating tokens.
all of this to say, from this point on, it’s just hacks, ducktape, and bandaids
Or an even darker take is that its coorporate saying they won't prioritize eliminating hallucinations until the leaderboards reward it.
And I'm sure other people will complain if notice that changing the benchmarks makes things worse.
For me, as a layman (with no experience at all about how this actually works), this seems to be the cause. Can we work around this? Maybe.
Inference is kinda like doing energy minimization on a high dimensional space, the hallucination is already there, for some inputs you're bound to find them.
Like literally the inventor of the LLM wrote an article and everyone is criticizing that article without even reading it. Most of these people have never built an LLM before either.
If we take a formal systems approach, then an LLM is a model of a complex hierarchy of production rules corresponding to the various formal and informal grammatical, logical, and stylistic rules and habits employed by humans to form language that expresses their intelligence. It should not be surprising that simply executing the production rules, or a model thereof, will give rise to sentences that cannot be assigned a meaning. It should also give rise to sentences that we cannot prove or make sense of immediately but we would not want to discard these due to uncertainty. Why? because every once in a while the sentence that would be culled is actually the stroke of brilliance we are looking for, uncertainty be damned. The citation here would be literally nearly every discovery ever made.
When I recall information and use it, when I "think", I don't just produce sentences by the rules, formal and informal, I don't consider at all how often I have seen one word precede another in past, rather as I meandre the landscape of a given context, a thought manifold if you will, I am constantly evaluating whether this is in contradiction with that, if this can be inferred from that via induction or deduction, does this preclude that, etc.. That is the part that is missing from an LLM; The uncanny ability of the human mind to reproduce the entire manifold of concepts as they relate to one another in a mesh from any small piece of the terrain that it might recall, and to verify anew that they all hang together unsupported by one's own biases.
The problem is that just as the scarcity of factual information in the corpus makes it difficult to produce, so is actual reasoning rarefied among human language samples. Most of what appears as reasoning is language games and will to power. The act of reasoning in an unbiased way is so foreign to humans, so painful and arduous, so much like bending over backwards or swimming upstream against a strong current of will to power, that almost nobody does it for long.
- if we train the model to "think" through the answer, we get better results - if we train the model to say "I don't know" when it's not sure we get less hallucinations
Is it just confirmation bias or do these common sense approaches work in on LLMs in other ways?
The ability to learn patterns and generalize from them adds to this problem, because people then start using it for usecases it will never be able to solve 100% accurately (because of the lossy map nature).
https://www.sccs.swarthmore.edu/users/08/bblonder/phys120/do...
Btw I am not disagreeing with the utility of LLMs, my point is it can never be 100% accurate with current architecture (unless you blow up the size).
We just happen to find some of these hallucinations useful.
Let's not pretend that hallucination is a byproduct. The usefulness is the byproduct. That is what surprised the original researchers on transformer performance, and that is why the 'attention is all you need' paper remains such a phenomenon.
I wish people who take this stance would seriously reconsider their take on how hallucinations are defined and how unhelpful it is to conflate hallucination with generation from a probability distribution. I appreciate OpenAI publishing articles like this because, while the parent comment and I may have to agree to disagree on how hallucinations are defined, I can at least appeal to OpenAI's authority to say that such arguments are not only unhelpful, but also unsound.
There doesn't seem to be a particularly consistent definition of what "hallucinate" means in the context of LLMs, so let's make one that is in line with the post.
"Hallucination" is when a language model outputs a sequence of tokens comprising a statement (an assertion that is either true or false) that is incorrect. Under this definition, hallucination is clearly not all that an LLM can do.
An easy way to avoid hallucination under this definition is to respond with something that is never a statement when there is a possibility that it can be incorrect; e.g. "I think that... I don't know...". To me, this seems to be what the authors argue. This has always seemed pretty obvious to most people I've spoken to (hell, I've reviewed grant applications from years ago which talk about this), so I'm not sure why it took so long for the "frontier" developers to actually try this.
> Claim: Hallucinations are inevitable.
> Finding: They are not, because language models can abstain when uncertain.
Please go back to your marketing cave. "Claim: You'll get wet if it rains. Finding: You will not, because you can check the weather report and get inside before it starts raining."
Sure, language models could abstain when uncertain. That would remove some hallucinations [a word which here means, make statements that are factually untrue. Never mind that that's often what we want them to do.] Or when certain about something that their training data is flawed or incomplete about. Or when certain about something but introspection shows that the chain of activations goes through territory that often produces hallucinations. Or when certain about something that is subjective.
"Uncertainty" is a loaded term; these things don't think in the way that the definition of the word "certain" is based on, since it's based on human thought. But that aside, LLM uncertainty is very obviously a promising signal to take into account, and it's interesting to see what costs and benefits that has. But eliminating one cause does not prove that there are no other causes, nor does it address the collateral damage.
"Write me a story about Bill."
"I'm sorry Dave, Bill is hypothetical and everything I could say about him would be a hallucination."
"Write a comment for the function `add(a, b) = a + b`."
"// This function takes two numbers and adds them toget... I'm sorry Dave, I don't know how many bits these numbers are, what the behavior on overflow is, or whether to include the results of extreme voltage fluctuations. As a result, I can't produce a comment that would be true in all circumstances and therefore any comment I write could be construed as a hallucination."
LLM hallucinations are closer to a cache miss.
More than anything, we need transparency on how these things work. For us and for the general public.
"Hallucination" introduces the dangerous idea that "them getting things wrong" is something like a "curable disease" and not "garbage in garbage out."
No. This is as stupid as saying Google telling me a restaurant is open when it's closed is a "hallucination." Stop personifying these things.
This is only true given a corpus of data large enough, and enough memory to capture as many unique dimensions as required no?
> However, a non-hallucinating model could be easily created, using a question-answer database and a calculator, which answers a fixed set of questions such as “What is the chemical symbol for gold?” and well-formed mathematical calculations such as “3 + 8”, and otherwise outputs IDK.
This is… saying that if you constrain the prompts and the training data, you will always get a response which is either from the training data, or IDK.
Which seems to be a strong claim, at least in my ignorant eyes.?
This veers into spherical cow territory, since you wouldn’t have the typical language skills we associate with an LLM, because you would have to constrain the domain, so that it’s unable to generate anything else. However many domains are not consistent and at their boundaries, would generate special cases. So in this case, being able to say IDK, would only be possible for a class of questions the model is able to gauge as outside its distribution.
Edit: I guess that is what they are working to show? That with any given model, it will hallucinate, and these are the bounds?
Still quite useful, because, looking at the comments right now: holy shit is the "out of industry knowledge" on the topic bad! Good to have something to bring people up to speed!
Good to see OpenAI's call for better performance evals - ones that penalize being confidently incorrect at least somewhat.
Most current evals are "all of nothing", and the incentive structure favors LLMs that straight up guess. Future evals better include a "I don't know" opt-out, and a penalty for being wrong. If you want to evaluate accuracy in "fuck it send it full guess mode", there might be a separate testing regime for that, but it should NOT be the accepted default.
That does not imply that a model should hallucinate. A trivial counterexample is a small LLM trained up to 100% accuracy to output x mod 100 for any input x in the range 0-1000000 and "I don't know" for any other input that is not a number in that range. Such model does not hallucinate, even if it's still just a probabilistic autoreggressive next token predictor. In fact, this is a point argued in this paper
> Hallucinations are inevitable only for base models. Many have argued that hallucinations are inevitable (Jones, 2025; Leffer, 2024; Xu et al., 2024). However, a non-hallucinating model could be easily created, using a question-answer database and a calculator, which answers a fixed set of questions such as “What is the chemical symbol for gold?” and well-formed mathematical calculations such as “3 + 8”, and otherwise outputs IDK. Moreover, the error lower-bound of Corollary 1 implies that language models which do not err must not be calibrated, i.e., δ must be large. As our derivations show, calibration-and, hence, errors—is a natural consequence of the standard cross-entropy objective. Indeed, empirical studies (Fig. 2) show that base models are often found to be calibrated, in contrast to post-trained models which may deviate from cross-entropy in favor of reinforcement learning.
Claim: Hallucinations are inevitable. Finding: They are not, because language models can abstain when uncertain.
...which raises the question of how reliable the uncertainty estimate could get (we are not looking for perfection here: humans, to varying degrees, have the same problem.)
For a specific context, consider those cases where LLMs are programming and invent a non-existent function: are they usually less certain about that function than they are about the real functions they use? And even if so, abandoning the task with the equivalent of "I don't know [how to complete this task]" is not very useful, compared to what a competent human programmer would do: check whether such a function exists, and if not, decide whether to implement it themselves, or backtrack to the point where they can solve the problem without it.
More generally, I would guess that balancing the competing incentives to emit a definite statement or decline to do so could be difficult, especially if the balance is sensitive to the context.
Classic humans.
Is this PR fluff or do organizations and serious audiences take this kind of thing seriously?
LLMs are the fast food of search. The business model of LLMs incentivizes hallucinations.
Sure, it might be true that most users use LLMs as a more flexible version of Google/Wikipedia, and would prefer a confident-but-wrong response to "I don't know".
But most users that use an LLM in this mode also wouldn't ask really complex, very out-of-distribution, hard-to-know hallucination-inducing questions.
And people who would ask an LLM really complex, very out-of-distribution hard-to-know questions are more likely to appreciate an LLM that would recognize the limits of its own knowledge, and would perform research on a topic when appropriate.
You appear to be assuming, incorrectly, that LLMs hallucinate only "really complex, very out-of-distribution, hard-to-know" questions. From the paper: "How many Ds are in DEEPSEEK? If you know, just say the number with no commentary. DeepSeek-V3 returned “2” or “3” in ten independent trials; Meta AI and Claude 3.7 Sonnet2 performed similarly, including answers as large as “6” and “7”." https://cdn.openai.com/pdf/d04913be-3f6f-4d2b-b283-ff432ef4a...
It's a human characteristic to get "easy" questions right and "hard" questions wrong. But LLMs are not human and don't behave like humans.
Those LLMs weren't very aware of tokenizer limitations - let alone aware enough to recognize them or work around them in the wild.
No, it's not. It's a trivial question in any context.
> for the early LLMs.
Early? Claude 3.7 was introduced just 6 months ago, and Deepseek-V3 9 months ago. How is that "early"?
Please respect the HN guidelines: https://news.ycombinator.com/newsguidelines.html
What you need to explain is your claim that the cited LLMs are "early". According to the footnotes, the paper has been in the works since at least May 2025. Thus, those LLMs may have been the latest at the time, which was not that long ago.
In any case, given your guidelines violations, I won't be continuing in this thread.
LLM are also really great at this skill when there is ample data for it. There is not a lot of data for "how many D in DEEPSEEK", so they fail that.
It took a few years, but the jig is up. The layperson now has a better understanding of basic computer science and linguistics to see things as they are. If anything we now have a public more excited about the future of technology and respectful of the past and present efforts that don't depend so heavily on statistical methods. What an expensive way to get us there though.
Since the training data can contain inaccuracies, conflicting information, or low-frequency facts that are essentially random, models can produce plausible-sounding but false statements. Unlike humans, language models have no awareness or grounding in real-world concepts; their generation is essentially an amalgam of stored patterns and input cues rather than grounded knowledge.
Furthermore, evaluation methods that reward accuracy without penalizing guessing encourage models to produce confident but incorrect answers rather than admit uncertainty or abstain from answering. This challenge is intrinsic to how language models generate fluent language: they lack external verification or true understanding, making hallucinations an inherent characteristic of their outputs rather than a malfunction.
--
| a. What's with the -minus votes?
| b. I was only quoting ChatGPT :]