Like does reasoning find a gradient to optimize a solution? Or are they just trying to expand state until finding what the LLMs world knowledge would say is highest probability?
For example, I can imagine an LLM reasoner might run out of state trying to perfectly solve for 50 intricate unit tests. Because it ping pongs between solving one case, then another, playing whack-a-mole and not converging.
Maybe there's an "oh duh" answer to this, but where I struggle with the limits of agentic work vs traditional ML.
In most cases, there's no explicit descent - and if any descent-like process happens at all, it's not exactly exposed or expressed as hard logic.
If you want it to happen consistently, you add scaffolding and get something like AlphaEvolve at home.