If so, this is a nice training signal for my own neural net, since my view of LLMs is that they are essentially analogy-making machines, and that reasoning is essentially a chain of analogies that ends in a result that aligns somewhat with reality. Or that I'm as crazy as most people seem to think I am.
Honestly I had no idea what to make of the abstract at first so I questioned duck.ai GPT5 mini to try to understand it in my own words, and according to mini, the first paragraph aligns pretty well with the abstract.
The second paragraph is my own opinion, but according to mini, aligns with at least a subset of cognitive theory in the context of problem solving.
I highly recommend asking an LLM to explore this interesting question you've asked. They're all extremely useful for testing assumptions, and the next time I can't sleep I'll probably do so myself.
Personally I haven't had any luck getting an LLM to solve even simple problems, but I suspect I don't know yet how to ask, and it's possible that the people who are building them are still working it out themselves.
How are you defining "problem"?
The last problem like this that I myself asked an LLM to solve was to find tax and base price of items on an invoice given total price and tax rates. I couldn't make sense of the answer, but asking the LLM questions made me realize that I had framed the problem badly, and moreso that I didn't know how to ask. (Though the process also triggered a surprising ability of my own to dredge up and actually apply basic algebra.) I'm sure it's that I'm still learning what and how to ask.
E.g.
[Text copletion driven by compressed training data] exhibit[s] a puzzling inconsistency: [it] solves complex problems yet frequently fail[s] on seemingly simpler ones.
Some problems are better represented by a locus of texts in the training data, allowing more plausible talk to be generated. When the problem is not well represented, it does not help that the problem is simple.
If you train it on nothing but Scientology documents, and then ask about the Buddhist perspective on a situation, you will probably get some nonsense about body thetans, even if the situation is simple.
I definitely agree replacing AI or LLMs with "X driven by compressed training data" starts to make a lot more sense, and a useful shortcut.
To give a concrete example, say we're generating the next token from the word "queen". Is this the monarch, the bee, the playing card, the drag entertainer? By adding more relevant tokens (honey, worker, hive, beeswax) we steer the token generation to the place in the "word cloud" where our next token is more likely to exist.
I don't see LLMs as "lossy compression" of text. To me that implies retrieval, and Transformers are a prediction device, not a retrieval device. If one needs retrieval then use a database.
I like to frame it as a theater-script cycling through the LLM. The "reasoning" difference is just changing the style so that each character has film noir monologues. The underlying process hasn't really changes, and the monologues text isn't fundamentally different from dialogue or stage-direction... but more data still means more guidance for each improv-cycle.
> say we're generating the next token from the word "queen". Is this the monarch, the bee, the playing card, the drag entertainer?
I'd like to point out that this scheme can result in things that look better to humans in the end... even when the "clarifying" choice is entirely arbitrary and irrational.
In other words, we should be alert to the difference between "explaining what you were thinking" versus "picking a firm direction so future improv makes nicer rationalizations."
In lossy compression the compression itself is the goal. In prediction, compression is the road that leads to parsimonious models.
The fact that LLMs can abstract concepts and do any amount of out-of-sample reasoning is impressive and interesting, but the null hypothesis for a LLM being "impressive" in any regard is that the data required to answer the question is present in it's training set.
Sure it does. Obviously. All we ever needed was some text completion.
Thanks for your valuable insight.
This isn't what LLMs are, of course, but what some political groups insist they are so they can strengthen copyright law by pointing to LLMs as "theft". It's all very pro-Disney, of course.
So you replace a more useful term with a less useful one?
Is that due to political reasons?
Firstly, Claude's self concept is based around humanity's collective self-concept. (Well, the statistical average of all the self-concepts on the internet.)
So it doesn't have a clear understanding of what LLMs' strengths and weaknesses are, and itself by extension. (Neither do we, from what I gathered. At least, not in a way that's well represented in web scrapes ;)
Secondly, as a programmer I have noticed a similar pattern... stuff that people say is easy turns out to be a pain in the ass, and stuff that they say is impossible turns out to be trivial. (They didn't even try, they just repeated what other people told them was hard, who also didn't try it...)
Fun fact, all those human-sourced estimates were hallucinations too.