- llama.cpp
- OpenCode
- Qwen3-Coder-30B-A3B-Instruct in GGUF format (Q4_K_M quantization)
working on a M1 MacBook Pro (e.g. using brew).
It was bit finicky to get all of the pieces together so hopefully this can be used with these newer models.
https://gist.github.com/alexpotato/5b76989c24593962898294038...
Up until relatively recently, while people had already long been making these claims, it came with the asterisks of „oh, but you can’t practically use more than a few K tokens of context“.
Edit: The unsloth quants seem to have been fixed, so they are probably the go-to again: https://unsloth.ai/docs/models/qwen3.5/gguf-benchmarks
Quite misleading, really.
Strong vision and reasoning performance, and the 35-a3b model run s pretty ok on a 16gb GPU with some CPU layers.
If you want to spend twice as much for more speed, get a 3090/4090/5090.
If you want long context, get two of them.
If you have enough spare cash to buy a car, get an RTX Ada with 96G VRAM.
Check out the HP Omen 45L Max: https://www.hp.com/us-en/shop/pdp/omen-max-45l-gaming-dt-gt2...
I'm curious which one you're using.
I imagine any 24 GB card can run the lower quants at a reasonable rate, though, and those are still very good models.
Big fan of Qwen 3.5. It actually delivers on some of the hype that the previous wave of open models never lived up to.
Obviously there's more to a model than that but it's a data point.
[1]: https://github.com/fairydreaming/lineage-bench
[2]: https://github.com/fairydreaming/lineage-bench-results/tree/...
Somewhere between Haiku 4.5 and Sonnet 4.5
That's like saying "somewhere between Eliza and Haiku 4.5". Haiku is not even a so-called 'reasoning model'.¹
¹ To preempt the easily-offended, this is what the latest Opus 4.6 in today's Claude Code update says: "Claude Haiku 4.5 is not a reasoning model — it's optimized for speed and cost efficiency. It's the fastest model in the Claude family, good for quick, straightforward tasks, but it doesn't have extended thinking/reasoning capabilities."
[0]: https://www-cdn.anthropic.com/7aad69bf12627d42234e01ee7c3630...
> Claude Haiku 4.5, a new hybrid reasoning large language model from Anthropic in our small, fast model class.
> As with each model released by Anthropic beginning with Claude Sonnet 3.7, Claude Haiku 4.5 is a hybrid reasoning model. This means that by default the model will answer a query rapidly, but users have the option to toggle on “extended thinking mode”, where the model will spend more time considering its response before it answers. Note that our previous model in the Haiku small-model class, Claude Haiku 3.5, did not have an extended thinking mode.
I would absolutely believe mar-ticles that Qwen has achieved Haiku 4.5 'extended thinking' levels of coding prowess.
Maybe "Qwen3.5 122B offers Haiku 4.5 performance on local computers" would be a more realistic and defensible claim.
What's your problem with Chinese LLMs?
An Analysis of Chinese LLM Censorship and Bias with Qwen 2 Instruct https://huggingface.co/blog/leonardlin/chinese-llm-censorshi...