Learning to Orchestrate Agents in Natural Language with the Conductor(openreview.net)

2 pointsby zaevlad7 hours ago2 comments

zaevlad7 hours ago
Sakana AI has presented their work “Learning to Orchestrate Agents in Natural Language with the Conductor,” which has been accepted to ICLR 2026. The idea is simple but powerful: instead of forcing a single model to handle an entire task on its own, the researchers trained a separate 7B model to act as a manager for other AIs.
This Conductor doesn’t write code or solve tasks directly. It looks at a problem and decides which agents to deploy, what subtask to give each one, and what context to provide. Essentially, it’s not just a router between models — it’s a meta-prompt engineer that assembles a working AI team tailored to a specific task.
What’s most interesting is that this behavior emerged not from hardcoded rules, but through reinforcement learning. For simple questions, the Conductor might rely on a single model call. For complex tasks, it builds a chain on its own: a planner, an executor, a verifier, and a correcting agent. It closely resembles how a strong team breaks down complex work into distinct roles.
The results look impressive. The 7B Conductor was able to outperform every individual model in its pool, including GPT-5, Gemini, Claude, and the open-source models available at the time of the research. The paper reports new state-of-the-art results on LiveCodeBench: 83.9%, and GPQA-Diamond: 87.5%. At the same time, the system proved cheaper than heavyweight multi-agent approaches like Mixture-of-Agents.
One standout feature is called Recursive Test-Time Scaling. The Conductor can select itself as one of the working agents, re-evaluate the output produced by its team, figure out where things went wrong, and assemble a new corrective workflow. In other words, scaling at inference happens not just by “thinking longer,” but by dynamically reconfiguring a new team in response to an error.
The key takeaway here isn’t just that there’s another multi-agent framework. What matters more is this: models are beginning to learn not only how to answer, but how to manage other models. Whereas AI systems used to be built around a single “smartest” agent, the focus is now shifting toward orchestration, roles, verification, and collective reasoning.
And it seems that Sakana is building its new multi-agent system, Sakana Fugu, precisely on this foundation.
immanuwell7 hours ago
a tiny 7b model learning to boss around much bigger llms by figuring out who talks to whom and actually beating them - is genuinely wild