Today's Frontier AI companies will never exceed the AI capability frontier again(andrewtrask.substack.com)

24 pointsby williamtrask3 hours ago3 comments

blargey24 minutes ago
A manic riff on https://xcancel.com/OpenRouter/status/2065856853989270011 , which advertises https://openrouter.ai/fusion/1 , which is a (slow) multi-model multi-prompt workflow that's specific to the "DRACO" benchmark for "deep research", and doesn't say much about coding and long-horizon agentic work, nor does it imply you can somehow parlay this into duct-taping 50 budget-tier models together for even more gains. Not even sure what "solo" even means in the context of the comparison chart - oneshot? Variant workflow since it doesn't make sense to run on one input?
Mixing outputs of different models one way or another is old news, if it were anywhere near as promising as the author dreams it would have exploded many months ago.
jwpapi2 hours ago
This is very interesting and the biggest glimpse of hope I’ve seen the last couple months here. I haven’t really paid attention to Fusion even though I got the email. I didn’t even assume it would be comparable.
JumpCrisscrossan hour ago
I’m so confused. The top fusion is Fable 5 and GPT 5.5. That is not an “ensemble of weaker AI models.”
- maniacwhatan hour ago
  What its saying is if you look at any single model, it can be beaten by an ensemble of weaker models. E.g fable 5 is beaten by an ensemble of previous gen models.
  - JumpCrisscrossan hour ago
    I guess so. 4.8 + 4.8 > Fable 5 is interesting, though not particularly game changing. (The others all fuse frontier models. Which is an argument for using those frontier models more. Not less.)
- tim-staran hour ago
  i guess the point is that any fusion is better than any single model and a fusion of the top two models is obviously the best? for cost though i guess you could just duct tape together 10 open source models and then thats comparable?
  - JumpCrisscrossan hour ago
    > though i guess you could just duct tape together 10 open source models and then thats comparable?
    This is what I was hoping to see data for.
    an hour ago
    undefined