GPT 5.5 aces 20x20 multiplication that o3 couldn't handle(twitter.com)

1 pointby marojejian44 minutes ago2 comments

Bender40 minutes ago
Forgive my ignorance here but why does a language learning model need to perform math at all rather than detecting math and handing it off to something optimized and trusted for everything from the most basic to the most advanced math that say mathematicians, CPA's and other professionals that depend on math would trust? Perhaps even create a short lived ephemeral link to the parsed input, interpretation and output of the math program showing it's work as proof that could be pasted into engineering and legal documents. Is this like code golf?
marojejian44 minutes ago
Tremendous progress in a year.
While these foundation models aren't trying to be calculators, this kind of test previously provided a decent benchmark on their ability to scale composing iterative reasoning steps, and showed they were not that good at it.
At this point I'm tempted to conclude they are pretty good at it, since I don't see how such long calculations could really be considered "in distribution" from training or "memorized," except in the sense the model learned the algorithm correctly.
I still have doubts about how good present the present architecture & training is at learning to "generalize" effectively. e.g. see ARC3
But you can go a very long way, by memorizing everything, being able to compose steps well, being able to try many times, and being able verify as well as a human, even if you aren't so efficient in your "fluid intelligence."
The fraction of human cognition operating today that can be handled with that current approach seems pretty large.