LLM-isms aside, I don't think we want this to be the case? An LLM, for all its complexity, is something that can be reasoned about. It's picking the next token, until it hits an EOS. The semantics imposed on those tokens (reasoning ,tool call, etc.) are up to the user('s harness) to decide and act on. The more that's pushed behind the facade, the harder it is achieve sufficient understanding of the model's behavior s.t. one can compose it into larger abstractions. Perhaps the performance (and the adherence to an interface/contract) compensate? But swapping from Opus or 5.5 to this or Fugu seems like a much bigger change than swapping between different 'base' models.
That's a deal-breaker for me. I need as much observability and control over my development workflow as possible; that's part of my secret sauce.
LLMs to me are better intelligence than humans in 3 aspects: 1. LLMs can somehow entirely do perspective taking, humans cannot even think self in next 10 minutes after making a decision 2. LLMs can somehow be asked to arbitrarily elevate and lower abstraction level (can be seen as a special form of perspective taking) 3. LLMs "think" instantly
All these innate capabilities should be combined with system level optimization to achieve the last 10% to be beyond human intelligence.
yes but from my experience abstracting (at least upward) is something all models really struggle with.
I would argue that the best models are quite away from human intelligence, let alone 10%.
They certainly seem to when A/B testing different models, and Fable routes to Opus 4.8 when guardrails fail.
Also, openrouter recently released a fusion router - https://openrouter.ai/blog/announcements/fusion-beats-fronti...
I think an optimal solution would be to have more seamless integration between harness and router roles. As each are only half the picture