Ran llama-bench on my M3 Pro with `--n-depth 0,8192,16384 --n-prompt 2048 --n-gen 256 --batch-size 2048 -ub 2048`:
| model | size | params | backend | threads | n_ubatch | test | t/s |
| ------------------------------- | ---------: | ---------: | ---------- | ------: | -------: | --------------: | -------------------: |
| qwen35moe 35B.A3B Q4_K - Medium | 19.74 GiB | 34.66 B | MTL,BLAS | 6 | 2048 | pp2048 | 512.97 ± 0.33 |
| qwen35moe 35B.A3B Q4_K - Medium | 19.74 GiB | 34.66 B | MTL,BLAS | 6 | 2048 | tg256 | 25.92 ± 0.23 |
| qwen35moe 35B.A3B Q4_K - Medium | 19.74 GiB | 34.66 B | MTL,BLAS | 6 | 2048 | pp2048 @ d8192 | 397.20 ± 2.32 |
| qwen35moe 35B.A3B Q4_K - Medium | 19.74 GiB | 34.66 B | MTL,BLAS | 6 | 2048 | tg256 @ d8192 | 22.56 ± 0.36 |
| qwen35moe 35B.A3B Q4_K - Medium | 19.74 GiB | 34.66 B | MTL,BLAS | 6 | 2048 | pp2048 @ d16384 | 313.67 ± 0.63 |
| qwen35moe 35B.A3B Q4_K - Medium | 19.74 GiB | 34.66 B | MTL,BLAS | 6 | 2048 | tg256 @ d16384 | 20.45 ± 0.04 |
I sure do want that silicon now haha.