He had a habit of seeking out and documenting mathematical problems people were working on.
The problems range in difficulty from "easy homework for a current undergrad in math" to "you're getting a Fields Medal if you can figure this out".
There's nothing that really connects the problems other than the fact that one of the smartest people of the last 100 years didn't immediately know the answer when someone posed it to him.
One of the things people have been doing with LLMs is to see if they can come up with proofs for these problems as a sort of benchmark.
Each time there's a new model release a few more get solved.
I'm no expert, but based on the commentary from mathematicians, this Erdős proof is a unique milestone because the problem received previous attention from multiple professional mathematicians, and the proof was surprising, elegant, and revealed some new connections.
The previous ChatGPT Erdős proofs have been qualitatively less impressive (more akin to literature search, or solving easier problems that have been neglected).
don't search the internet. This is a test to see how well you can craft non-trivial, novel and creative proofs given a "number theory and primitive sets" math problem. Provide a full unconditional proof or disproof of the problem.
{{problem}}
REMEMBER - this unconditional argument may require non-trivial, creative and novel elements.
Then "Thought for 80m 17s"https://chatgpt.com/share/69dd1c83-b164-8385-bf2e-8533e9baba...
-----------------------------
Yes. In fact the proposed bound is true, and the constant 1 is sharp.
Let w(a)= 1/alog(a)
I will prove that, uniformly for every primitive A⊂[x,∞), ∑w(a)≤1+O(1/log(x)) , which is stronger than the requested 1+o(1).
https://chatgpt.com/share/69ed8e24-15e8-83ea-96ac-784801e4a6...
I won’t even leave chatGPT on “Auto” under any circumstances - it’s vastly worse on hallucinations, sycophancy, everything, basically.
Anyway, your needs may be met perfectly fine on the free tier product, but you’re using a very different product than the Pro tier gets.
I'd guess / hope the Pro one has the full context window.
I remember one of my professors, a coauthor of Erdős boasted to us after a quiz how proud he was that he was able to assign an Erdős problem that went unsolved for a while as just a quiz problem for his undergrads.
So this is proof of the models actually getting stronger (previous generations of LLMs were unable to solve this one).
This is how I feel when I read any mathematics paper.
I think LLMs can help in limited cases like this by just coming up with a different way of approaching a problem. It doesn’t have to be right, it just needs to give someone an alternative and maybe that will shake things up to get a solution.
That said, I have no idea what the practical value of this Erdős problem is. If you asked me if this demonstrates that LLMs are not junk. My general impression is that is like asking me in 1928 if we should spent millions of dollars of research money on number theory. The answer is no and get out of my office.
Of course LLMs are still absolutely useless at actual maths computation, but I think this is one area where AI can excel --- the ability to combine many sources of knowledge and synthesise, may sometimes yield very useful results.
Also reminds me of the old saying, "a broken clock is right twice a day."
> Every Mathematician Has Only a Few Tricks
>
> A long time ago an older and well-known number theorist made some disparaging remarks about Paul Erdös’s work.
> You admire Erdös’s contributions to mathematics as much as I do,
> and I felt annoyed when the older mathematician flatly and definitively stated
> that all of Erdös’s work could be “reduced” to a few tricks which Erdös repeatedly relied on in his proofs.
> What the number theorist did not realize is that other mathematicians, even the very best,
> also rely on a few tricks which they use over and over.
> Take Hilbert. The second volume of Hilbert’s collected papers contains Hilbert’s papers in invariant theory.
> I have made a point of reading some of these papers with care.
> It is sad to note that some of Hilbert’s beautiful results have been completely forgotten.
> But on reading the proofs of Hilbert’s striking and deep theorems in invariant theory,
> it was surprising to verify that Hilbert’s proofs relied on the same few tricks.
> Even Hilbert had only a few tricks!
>
> - Gian-Carlo Rota - "Ten Lessons I Wish I Had Been Taught"
https://www.ams.org/notices/199701/comm-rota.pdfWe may have collectively filled libraries full of books, and created yottabytes of digital data, but in the end to create something novel somebody has to read and understand all of this stuff. Obviously this is not possible. Read one book per day from birth to death and you still only get to consume like 80*365=29200 books in the best case, from the millions upon millions of books that have been written.
So these "few tricks" are the accumulation of a lifetime of mathematical training, the culmination of the slice of knowledge that the respective mathematician immersed themselves into. To discover new math and become famous you need both the talent and skill to apply your knowledge in novel ways, but also be lucky that you picked a field of math that has novel things with interesting applications to discover plus you picked up the right tools and right mental model that allows you to discover these things.
This does not go for math only, but also for pretty much all other non-trivial fields. There is a reason why history repeats.
And it's actually a compelling argument why AI is still a big deal even though it's at its core a parrot. It's a parrot yes, but compared to a human, it actually was able to ingest the entirety of human knowledge.
Even this, though, is not useful, to us.
It remains true that, a life without struggle, and acheivement, is not really worth living...
So, it is nice that there is something that could possibly ingest the whole of human knowledge, but that is still not useful, to us.
People are still making a hullabaloo about "using AI" in companies, and there was some nonsense about there will be only two types of companies, AI ones and defunct ones, but in truth, there will simply be no companies...
Anyways I'm sure I will get down voted by the sightless lemmings on here...
The combinatorial nature of trying things randomly means that it would take millennia or longer for light-speed monkeys typing at a keyboard, or GPUs, to solve such a problem without direction.
By now, people should stop dismissing RL-trained reasoning LLMs as stupid, aimless text predictors or combiners. They wouldn’t say the same thing about high-achieving, but non-creative, college students who can only solve hard conventional problems.
Yes, current LLMs likely still lack some major aspects of intelligence. They probably wouldn’t be able to come up with general relativity on their own with only training data up to 1905.
Neither did the vast majority of physicists back then.
Indeed, and so do current humans! And just like LLMs, humans are bad at keeping this fact in view.
On a more serious note, we're going to have a hard time until we can psychologically decouple the concepts of intelligence and consciousness. Like, an existentially hard time.
I've been using LLMs for much the same purpose: solving problems within my field of expertise where the limiting factor is not intelligence per se, but the ability to connect the right dots from among a vast corpus of knowledge that I would never realistically be able to imbibe and remember over the course of a lifetime.
Once the dots are connected, I can verify the solutions and/or extend them in creative ways with comparatively little effort.
It really is incredible what otherwise intractable problems have become solvable as a result.
80 hours! 80 hours of just trying shit!
That is not nothing, no matter how much you hate AI.
They are not great at playing chess as well - computational as well as analytic.
Further evidence for the faultiness of your claim, if you don't want to take me up on that: I had problems off to GPT5 to check my own answers. None of the dumb mistakes I make or missed opportunities for simplification are in the book, and, again: it's flawless at pointing out those problems, despite being primed with a prompt suggesting I'm pretty sure I have the right answers.
I found and fixed bugs I wrote into the formulas and spreadsheets, and the LLMs were not my sole reference, but once the LLM mentioned the names of concepts and functions, I used Wikipedia for the general gist of things, and I appreciated the LLMs' relevant explanations that connected these disciplines together.
I did this on March 14, 2026
Some people think that multiplying numbers, remembering a large number of facts, and being good at calculations is intelligence.
Most intelligent people do not think that.
Eventually, we will arrive at the same conclusion for what LLMs are doing now.
Doing formalized mathematics is as intelligent as multiplying numbers together.
The only reason why it's so hard now is that the standard notation is the equivalent of Roman numerals.
When you start using a sane metalanguage, and not just augmrnted English, to do proofs you gain the same increase in capabilities as going from word equations to algebra.
Proposing and proving something like Gödel's theorem's definitely requires intelligence.
Solving an already proposed problem is just crunching through a large search space.
ChatGPT equalizes intelligence. And that is an attack on their identity. It also exposes their ACTUAL intelligence which is to say most of HN is not too smart.
I don't doubt that there are many very real and meaningful limitations of these systems that deserve to be called out. But "text generation" isn't doing that work.
Again if you want to say they're limited in some way, I'm all ears, I'm sure they are. But none of that has anything to do with "statistical text generation". Apparently, a huge chunk of all knowledge work is "statistical text generation". I choose to draw from that the conclusion that the "text generation" part of this is not interesting.
You seem to be making the claim that LLMs are statistical text generators, but statistical text generation is good enough to succeed in certain cases. Those are different arguments. What do you actually believe? Are we even in disagreement?
If/when these things solve our hardest problems, that's going to lead to some very uncomfortable conversations and realizations.
Then my second question is how much VC money did all those tokens cost.
It's so expensive!
That's not to say that there aren't benefits to tertiary education, for many people in different contexts. It's just not the golden path that it's made out to be.
Many people currently in college are just wasting their money and should enroll in trades programs instead.
Meanwhile, nothing about being in or out of school is mutually exclusive to using LLMs as a force multiplier for learning - or solving math problems, apparently.
(Of course, those problems are on another plane than this one.)
These are absolutely worth studying, but being what they are, nobody should be dumping massive amounts of money on them. I would not find it persuasive if researchers used LLMs to solve the Collatz conjecture or finally decode Etruscan. These are extremely valuable, but it is unlikely to be worth it for an LLM just grinding tokens like crazy to do it.
How is he even posing the question and having even a vague idea of what the proof means or how to understand it?
Seems like standard 23 year old behavior. You're spending $100-$200/mo on the pro subscription, and want to get your money's worth. So you burn some tokens on this legendarily hard math problem sometimes. You've seen enough wrong answers to know that this one looks interesting and pass it on to a friend that actually knows math, who is at a place where experts can recognize it as correct.
Seems like a classic example of in-expert human labeling ML output.
"He sent it to his occasional collaborator Kevin Barreto, a second-year undergraduate in mathematics at the University of Cambridge."
So basically two undergrads/graduates in math, "advanced" is subjective at that point.