7 pointsby jcfrei3 hours ago2 comments

kloud2 hours ago
Fusion of frontier models beating Fable, or cheaper models matching Fable performance at half the cost. Great announcement timing.
What is missing in the article is the reasoning/effort levels, so it is not ruled out the results differ just due to different reasoning budgets.
I would also be interested in seeing coding performance on SWE benchmarks.
andai37 minutes ago
Came here to post the same article!
The headline result here: (Opus 4.8 + Opus 4.8) > Fable 5
It looks like "fusing" a model with itself gives almost as much gain as fusing two different models.
I saw promising numbers for model fusion before https://news.ycombinator.com/item?id=44630724
(In this case, a different approach: they randomized the LLM provider for every agentic turn. They found this helped a lot.)
But it's funny (and not too surprising) that just "alloying" a model with itself has a very similar effect. It's basically just more test time compute right? More reasoning time. With the benefit that the reasoning is parallel. Same cost, less time!
I'd love to see more numbers on this, especially with the cheaper models. (For some models, caching is so good now, that reprompting and forking are basically free.) Are the gains for tiny llms comparatively bigger or smaller? etc.
- Alifatisk9 minutes ago
  > It's basically just more test time compute right?
  I think this is the key takeaway from here.