While you're at it, feel free to send me $200 as well, I'll generate a crypto address ending with "AI".
$20/month: Claude Code
$10/month: Minimax
$16/month: Xiaomi Mimo
$10/month: Opencode Go
Opus at low/medium effort generates plans. Then several coordinator/worker pairs are possible: DeepSeek v4 Pro + Minimax M3, Mimo v2.5 Pro + Mimo v2.5, Mimo + Minimax, Sonnet 4.6 + Haiku. I've been running hundreds of long multi-agent sessions, topped up extra credits here and theere, but haven't reached $200/month spend yet. Relying entirely on Claude/Codex feels like a waste of cash now.(don't send anything, sharing only because of the base58 fun fact I didn't know)
Omitting those characters makes it good for generating passwords if they need to be typed in by hand.
Double-clicking a base58 string always selects the whole string and it doesn't wrap accidentally, thanks to missing / and +, so it's also convenient to copy and paste.
These prices are just going to get raced to $0.
It's similar to how AirPods normalised all of us having $300+ headphones. All of us would have scoffed at the idea a decade ago.
Would you want to use a text editor that updates the screen very slowly? Kind of the same thing for using agentic systems as coding assistants: don’t want a ‘sluggish’ experience.
But your aunt Josie didn't have one. Now Apple is selling 80 million units / year and the ~$300 price tag has become normal. Before that, most people had headphones that were 10 times cheaper.
I just averaged it out.
Guess what, the big players are hoarding all the RAM and GPUs so that other people can't afford decent hardware. It's working out beautifully for them!
It's $200/month. You have to take into account energy costs and all the rest of a system, but if you break even within 1-2 years ($2400-$4800) it'd be a pretty good deal. And $4000 buys you a pretty decent system.
But it's a hefty upfront investment for people who just want to experiment. The good thing about $200/month subscriptions is that you can cancel them any time and cut your losses. Not so with a $4000 computer that loses half of its resale value as soon as you plug it in.
I think the current sweet spot for people who don't already own a high-end gaming PC is to rent a server with a beefy GPU from Hetzner et al. and run local models there.
I've been shipping production on archive.tw with Fugu Ultra in /advisor on oh-my-pi.
Advisor doesn’t slow the loop if the driver stays fast. Worth it if your harness can split advisor from worker.
David Ha, CEO and co-founder, was one of the youngest managing director at Goldman Sachs before doing ML at Google. His ML publications were considered top-notch almost a decade ago. I had high hopes for him when he raised money and founded Sakana.
I do agree with some comments here that perhaps this particular product is not well thought out. I also agree with the criticism that David calls Sakana a frontier AI lab while making money just selling AI B2B applications to Japanese businesses. I also agree with the assessment that Sakana has abrasive and antagonistic, sometimes openly hostile, recruiting tactics. I also agree that his then-impressive publications may have lost their luster in the age of LLMs.
However, the man is clearly driven; and he and his team may have more to offer in future. I admire the man for not taking the conventional AI-research career path.
More broadly, Sakana is pursing a refreshingly distinct research path, with their focus on evolutionary methods, biological intelligence (e.g. continuous thought machines) and open publication.
Probably taking hate from both sides - OpenAI / Claude fans who are undercutting its moat. Chinese open-model fans that want it to be cheaper.
But it's a genuine accomplishment to hit those benchmarks and offer a reasonable plan?
Bizarre reaction TBH.
All put together, paying ~$60 to get a hit-or-miss report seems a bit excessive, but obviously as the models they use under the hood get better it becomes more and more worth it, assuming they also improve their grounding/search capabilities.
I'm a big fan of Sakana though, and have followed David Ha / @hardmaru since the world models papers (with the racing car game and the Doom clone), which were incredible at the time.
Also, from the technical report, looks like they're training on the output of Claude Code, etc. I'm guessing this doesn't violate TOS because they're technically not a directly competing model. This brings me to what I see as the main risk with this service, which is that it seems like an easy thing for a frontier lab to make obsolete, either by models beginning to converge in terms of strengths or by improving their own harnesses to include more of this meta-reasoning.
This gets you that in a nice neat package, without the underlying tinkering mechanics.
If (big iff) the usage mechanics work out, then this is actually a really good anti-big-model strategy.
They'll be incentivized for your success, not token-maximizing for their investors.
The team is super smart too. What's not to like?
Wishing them the best on launch.
But their paid plans I'm not sure yet - planning to subscribe and can let you know.
Almost no chance it will be as generous as OpenAI though. They just don't have the money :-)
Does multiple vendors run this "single API" or how is this not replacing a single-vendor dependency for another single-vendor dependency?
it's interesting that they're offering in the form of fixed cost subscription plans too. My impression was that the first party providers can do this because they api inference margins to the tune of 80ish percent. Anyone else orchestrating on top of these models have to pass through these costs or eat it themselves.
After a few months of spending money on the best frontier models, now I am spending time using DeepSeek v4 flash as my workhorse, and flipping to more capable (but still very inexpensive) open models on an as-needed basis. We all make our own tool selection decisions, but for me, I feel happier and enjoy working more following the very fast response and ultra low cost path.
At least, for the initial data gathering phase. You'd probably want a sequence of progressively larger models to filter it.
Have you guys tested it on anything other than research?
Personally I prefer understanding the dimensions and the interplay and controlling it though can see why openrouter and others are now offering this a solved solution.
Just be careful when you start outsourcing too much of your intelligence needs to a blackbox.
EDIT: Found something here https://dev.classmethod.jp/en/articles/sakana-fugu-ga-first-...
There's also the concept of "smart routing" requests based on some heuristics / embeddings. You'd get "simple" tasks handled by smaller (cheaper) models and use a bigger model to curate / sort / merge the results.
There's a lot of things to try here. I wouldn't personally pay for this service, but I don't think it's "a joke"...
https://news.ycombinator.com/item?id=44630724
They randomly alternated between frontier LLMs and got a massive boost to performance on cybersecurity tasks.
Is there any official source that could confirms if Fable (or Mythos) is parallelized test-time compute (like GPT 5.5 Pro) or sparse Mixture-of-Experts (MoE) transformer combined with a multi-agent, inference-time compute scaling architecture (Gemini 3.1 Deep Think)?
If cost becomes an even bigger problem being able to choose "best performance possible" or "strong but cost effective" will be useful.
What's nice is that OpenRouter included a pareto graph showing the cost as well as the performance. (But not time, unfortunately -- model fusion adds a large factor to round trip time.) Benchmarks are a lot less helpful without that.
OpenRouter: Surpassing frontier performance with fusion (blog post with benchmarks)
https://news.ycombinator.com/item?id=48525392
OpenRouter Fusion API
https://news.ycombinator.com/item?id=48537641
See also: Sibling comment with an open source implementation
https://news.ycombinator.com/item?id=48624782#48629598
I did my own last weekend in a few lines of Python, though I haven't tested it much yet. (Looking for some very hard, very cheap benchmarks, if such a thing exists!)
This is ask a special orchestrator they built, which is in front of a bunch of models, which model would suit the request best.
Regular Fugu seems to be just "pick the best model and route the request there"
Fugu Ultra can generate like a little mini workflow/plan instead to achieve a result
1. Ask GPT to derive the math. 2. Ask Opus to check for implementation/security issues. 3. Ask Gemini to synthesize or resolve disagreement. 4. Return final answer.
I could be wrong but seems to be that at a glance, so I think it's more dynamic than OpenRouter Fusion.
https://www.databricks.com/blog/introducing-omnigent-meta-ha...
> So basically... openrouter
:skull:
i now really wonder how many people of the public understood my thesis defense lol
The reasoning chains could have been used, and the resulting combined model could easily and effectively have been distilled.
We open sourced it all
and will be releasing a similar orchestrator next week on TrustedRouter
Looks like Fusion calls a bunch of models and then uses an LLM to synthesize the results, and pass to another model for final output.
Fugu looks like it's doing something different? Using an LLM earlier on in the flow as an orchestrator to decide which other LLMs to call. More coordinator than simply synthesizing results, and more "agentic".
It's interesting because it's all exposed behind a single OpenAI compatible endpoint (Responses API?) and so then presumably someone could use this for one of their single agents. Now you have agent-of-agents, nested in some sense. The token usage increases accordingly!
Basically, if you combine a bunch of near-frontier models (like GPT 5.5, etc) you can get performance that sometimes surpasses top line models like Claude's Fable.
Sakana seems to have a separate approach using a domain specific model to perform the model routing step.
The same model that has been post-trained to operate for hours as a Linux admin will be incapable of writing a heartfelt email, but with something like Fugu, you'd get both the Linux admin for driving the browser harness and the smaller writing specialist model for drafting the email itself.
https://japannews.yomiuri.co.jp/politics/defense-security/20...
https://openai.com/index/our-agreement-with-the-department-o...
Like every company based in China they are under the control of the Chinese state, which is an armed entity known to use violence.