Transcript and HTML here: https://gist.github.com/simonw/ecaad98efe0f747e27bc0e0ebc669...
I mean the prompt was succinct and clear, as always - and it still decided to hallucinate multiple features (animation + controls) beyond the prompt.
It'd also like to point out that to date no drawing was actually good from an actual quality perspective (as in comparative to what a decent designer would throw together)
Theyre always only "good" from the perspective of it being a one shot low effort prompt. Very little content for training purposes.
Of course, a while back there was a Gemini release that I believe specifically called out their ability to produce SVGs, for illustration and diagramming purposes. So it's not longer necessarily the case that the labs aren't training on generating SVGs, and in fact, there's a good chance that even if they're not doing so explicitly, the RLVR process might be generating tasks like that as there is more and more focus on frontend and design in the LLM space. So while they might not be specifically training for a pelican riding a bicycle, they may actually be training on SVG diagram quality.
https://simonwillison.net/2024/Oct/25/pelicans-on-a-bicycle/
Private companies will never open up a technological breakthrough to their competitors. It just doesn't make sense. If you want an entire field to advance, you have to open it up.
I do wonder where we go from here.
Model seems quite capable, but this use-case is just yikes. As if interviewing isn't already a hellscape.
Price/quality is absolutely bonkers though. I loaded $40 a few weeks/months ago and I haven’t even gone through half of it.
I wish they did more smaller models. Kimi Linear doesn't really count, it was more of a proof of concept thing.
I tried it once, although it looks amazing on benchmarks, my experience was just okay-ish.
On the other hand, Qwen 3.6 is really good. It’s still not close to Opus, but it’s easily on par with Sonnet.
Close to what, and how are you measuring?
> nobody in the USA would be spending 7 figures on infrastructure for it
Au contraire, if AI had a moat it would pay for itself. They're funneling capital into infrastructure because they know it can't.
Unfortunately the generation of the English audio track is work in progress and takes a few hours, but the subtitles can already be translated from Italian to English.
TLDR: It works well for the use case I tested it against. Will do more testing in the future.
Also discovered that using OpenCode instead of the kimi cli, really hurts the model performance (2.5).
Kimi 2.5 (which this is based on) is served at $0.44 input / $2 output by a ton of different providers on OpenRouter, 2.6 will certainly be similar.
That's about 11X less than Opus for similar smarts.
The ~100k hardware is suitable for multi-user, small team usage. That's what you'd use for actual work in reasonable timeframes. For personal use, sure macs could work.
The test data is purposely difficult to access to reduce the chance of leaking it into the training dataset.
Is this the same model?
Unsloth quants: https://huggingface.co/unsloth/Kimi-K2.6-GGUF
(work in progress, no gguf files yet, header message saying as much)
But the files are only roughly 640GB in size (~10GB * 64 files, slightly less in fact). Shouldn't they be closer to 2.2TB?
"Kimi-K2.6 adopts the same native int4 quantization method as Kimi-K2-Thinking."
So am I misunderstanding "Tensor type F32 · I32 · BF16" or is it just tagged wrong?
This should be so easy to prove if it were true. Yet there is none of it, just vibes.
Still, your other two points are completely valid. The opaqueness of usage quotas is a scam, within a single month for a single model it can differ by more than 2x. And this indeed has been proven.
Might be a configuration or prompt issue. I guess I'll wait and see, but I can't get use out of this now.
edit: Note that you can run it yourself with sufficient resources, or access it from other providers too: https://openrouter.ai/moonshotai/kimi-k2.6/providers
Edit: found it.
> We may use your Content to operate, maintain, improve, and develop the Services, to comply with legal obligations, to enforce our policies, and to ensure security. You may opt out of allowing your Content to be used for model improvement and research purposes by contacting us at membership@moonshot.ai. We will honor your choice in accordance with applicable law.
Section 3 of https://www.kimi.com/user/agreement/modelUse?version=v2
So in other words only if you can point to a local law which requires them to comply with the opt out?
Not sure about coding usage, Google being weird about these things I could see that quota being separate.
EDIT: Wrong comment: they compared it with 4.6, my comment was for the Qwen-3.6 Max release blog post...