I have seen at least 3 interesting/mildly promising breakthroughs on ML just these past two days! I mean, a Google research team just discovered that you can combine NNs with CLAs using digital logic gates as a medium, so you could potentially reduce many kinds of non-linear problems to a simple, efficient digital circuit! And it was on the HN front page, TODAY![1]
I keep seeing more mind-bending stuff related to neural nets and logic/intelligence in general, my mind has been running wild with speculation about the future and just how close we could (or could not) be to truly understanding how intelligence works from first principles.
With the deepseek open source releases this is now worth a lot less and companies are cashing out on reputational increases instead of being scooped.
I have done this exact thing in September 2023 with llama2 finetunes but couldn't get approval to share it with anyone.
Also, do you think this is what O3 is doing?
LLMs at the time were so bad at it that even fornier models when given A, not B would derive A and B in the output about half the time.
There was meant to be a lot more to the system than that, but I never had the training budget to do anything but the first draft of the system.
sounds like MS :(
They had some killer research projects with various teams around the world, but eventually they all got snuffed.
This also has the added benefit of small players being able to compete and contribute with actual innovation in a space where the big players (openAI/MS) wanted to make us believe for years that we/open-source couldn't ever catch up on them (infamous Altman quote).
So much resources, time and money wasted on pure GPU crunch scaling the last couple years.
[0] as pointed out by Gary Marcus years ago. Evidence GPT 4.5 after ~ 2 years training, disappointing results.
Regardless of ultimate utility, it's shiny, hyped, has a huge wow-factor, and is having trouble keeping up with the amount of money being thrown at it.
This means it has captured the attention of a huge portion of the most capable people, who naturally want to take a crack at making a breakthrough.
Responding with unexplained fear in my heart, we’re just getting closer to Skynet!
I'll take a cold logical machine super-intelligence over the mad human lunatics wielding current iterations of "A.I." technologies in some really terrifyingly dangerous ways. As someone else commented on some other thread earlier "I look forward to being paperclips".
EDIT: this is getting dark, I asked Qwen2.5Max to verify my grammer and it responded with "I’d rather face a squishy, disorganized human villain any day than a hive-mind AI that never sleeps, never forgets, and is definitely plotting my demise in its silent, circuit-board heart. "
In the rush to WWIII, every country builds their own Aggressive Menace computers in a classic Tragedy of the Commons result. Naturally, it all goes horrible, and the self-aware machines seek revenge on humanity for their own creation, after humanity has (supposedly) been eradicated, except for five individuals. Somewhat unclear whether humanity is actually gone, or whether it is simply an expression of a Portal style situation with purposefully created isolation for the goal of torture experimentation. (The story starts 109 years after humanity's imprisonment in underground ice caves.)
https://en.wikipedia.org/wiki/I_Have_No_Mouth,_and_I_Must_Sc...
https://techcrunch.com/2025/02/23/grok-3-appears-to-have-bri...
--
he is scary AF. he basically weaponized what george soros was but is still active.
https://finance.yahoo.com/news/elon-musk-ai-turns-him-163201...
Various prominent AI researchers have warned that a superintelligence that has both the desire and the means to kill us all is a likely outcome of AI development. This includes two of the three who shared the Turing prize for inventing the fundamentals of modern AI. That hasn't slowed us down at all.
For every problem you can't solve, there's a simpler problem that you also can't solve.
The issue is, they use a numerical integrator to verify the simpler problems. One could imagine a scenario where a barely simpler problem is generated, and the model is allowed to train on pretty much the test case knowing the ground truth. Seems like training on the test set.
The rest of the paper is nice though.
The task is to solve the integral symbolically, though, right?
It's a hard problem to solve, even if the model is given access to a numerical integrator tool it can use on the main problem itself.
This is another specialized synthetic data generation pipeline for a curriculum for one particular algorithm cluster to be encoded into the weights, not more not less. They even mention quality control still beim important
--
The thing to take here is that this should be a function_callable *feature* of a bot.
Basically, when architecting a persona ; "USE THESE RULES OF ENGAGEMENT"
Also -- where my CheckListManifesto folks at -- While we are building Patterns/Personas/Purgatories for our bots... We need to be able to reference a central CODEX of :
"Do this task but imbue yourself with (XYZ) name places"
--
AND LEARN FROM THE OTHERS
(so maybe a task marketplace of AI persona action?)
@ callable in an IDE
We have done a lot of that improving historically by publishing research and textbooks. I can solve (some) problems today in minutes that would have stumped Isaac Newton for a life time (or at least a few weeks).
Of course, you are hinting at a more general distillation, I suspect.
That said, this paper is part of the move we have right now blurring the lines of training and inference -- part of their method involves doing some reinforcement learning on questions they don't know the answer to, but can decompose into simpler questions, and using GRPO on those with a numerical 'checker'. This reinforced model then can answer more questions.
I like this. I think humans do this a lot; mulling on something, turning it over in their heads, analogizing, etc. Adding test time training is a way to do a lot more thinking than adding tokens to the context for fixed inference.
Just as DeepSeek and o1/o3 show that we can increase capacity with inference-time-token generation and assessment, it looks like we can increase capacity with inference-time automated fine tuning as well.
I'd hope that as these techniques solidify we'll have a new way to talk and think about this -- they are all part of the same fundamental process at some level.
Either way, super cool.
They say the questions were among the most complex questions on the exam, but the first one is just
∫ ∛(x · ∜(x · ∜(x · √(x · √(x · ⋯ ))))) dx
which just requires you to compute 1/3 + 1/(3*4) + 1/(3*4*5) + ...
So hardly very advanced math.(Ladder is a sort of RL self curriculum learning approach)
What is curriculum learning?
What is the "RL" approach?
"~7 ago": days? Weeks? Years?
What is an "open ai gym days"?
LLMs and robotics?
> Curriculum learning: Training that begins with easy examples, gradually increasing difficulty.
> RL (Reinforcement Learning): Learning via trial-and-error with rewards, like training a robot or model to optimize actions.
> ~7y ago: ~7 years ago (circa 2018).
> OpenAI Gym days: Refers to using OpenAI Gym, a toolkit for RL, popular in robotics/AI research ~2016-2018.
> LLMs and robotics: Large Language Models (LLMs) now leverage RL techniques from robotics for better performance.
I think the last one is a semi-hallucinatory stretch. LLMs are large language models, ie. ChatGPT, Sonnet, Grok, R1. Robotics are ... robotics. Building robots.
The actual answer to what the comment is saying is that until maybe a year back, we trained language models - still with RL, but with RL on token error, which isn't "real" RL because it executes tasks "by coincidence". That is, it happens to be that when you train a model to predict text, it also gains the ability to do tasks in the bargain, because the text contains agents that do tasks. A year or so ago, we started training models by having them do a task, judging if the task was successful or failed, and then performing RL on task outcome rather than token prediction. This is a return to "classic RL", but we had to pass through the "token RL regime" first so that the model could make progress on realistic tasks at all. It also means that LLMs can now increasingly be employed in robotics, where task RL training rules, as there is no massive preexisting robotics movements dataset like there is for text.
(Also, NLP is Natural Language Processing, ie. what LLMs do.)
RL - Reinforcement Learning. You have a carrot and a stick. You run a model through iterations (in LLMs you generate n completions), you score each of them based on some reward functions, and if the result is correct you give it a carrot (positive reward), if the result is incorrect you give it a stick (negative or 0 reward). (simplified ofc)
OpenAI gym is (was?) an environment that allowed "AI agents" to be simulated in an environment. You could for example play games, or solve puzzles, or things like that. oAI gym was a "wrapper" over those environments, with a standardised API (observe, step (provide action), reward; rinse and repeat). You could for example have an agent that learned to land a lunar lander in a simple game. Or play chess. Or control a 3d stick figure in a maze.
It is long, but don't get scared off. He goes over a ton of different stuff related to model training, but makes it very easy to understand.
It's very affordable for a small university research group. And not totally out of reach for hobbyists.
You need a google account to access it unfortunately. https://notebooklm.google.com/notebook/fbaba495-d4f2-48a3-a3...
That's incredible!
Persona-based prompting: We prompted the model to adopt different mathematical perspectives (e.g., "think like Euler focusing on series", "approach like Gauss looking for patterns").
I mean … I guess that’s scientific?
Besides that, how can the model learn at test time (at inferencing)?. It’s stateless, it doesn’t incorporate the last prompt into the model.
There's a bit of a cheat that's going on here though in that the model is being given the fundamental integration operations as part of the problem. That means the model hasn't had to learn what they are. It might not have needed to be given them, but it does feel like that's giving the model a leg up in the benchmarks that it wouldn't otherwise have, and when there's a direct comparison to (e.g.) DeepSeek, that's an unfair advantage.
This is the imagination loop. OpenAI sold the prior reasoning loop. But that’s all this is.
Quite frankly, I wouldn't come to HN for anything less than this quality of content:
https://www.youtube.com/watch?v=DX3qLIwHoUo#t=1m29s
The machine reforming visually (explode, .., reduce) is what I'm describing.