2 pointsby abhikul04 hours ago1 comment
  • abhikul04 hours ago
    Ran llama-bench on my M3 Pro with `--n-depth 0,8192,16384 --n-prompt 2048 --n-gen 256 --batch-size 2048 -ub 2048`:

      | model                           |       size |     params | backend    | threads | n_ubatch |            test |                  t/s |
      | ------------------------------- | ---------: | ---------: | ---------- | ------: | -------: | --------------: | -------------------: |
      | qwen35moe 35B.A3B Q4_K - Medium |  19.74 GiB |    34.66 B | MTL,BLAS   |       6 |     2048 |          pp2048 |        512.97 ± 0.33 |
      | qwen35moe 35B.A3B Q4_K - Medium |  19.74 GiB |    34.66 B | MTL,BLAS   |       6 |     2048 |           tg256 |         25.92 ± 0.23 |
      | qwen35moe 35B.A3B Q4_K - Medium |  19.74 GiB |    34.66 B | MTL,BLAS   |       6 |     2048 |  pp2048 @ d8192 |        397.20 ± 2.32 |
      | qwen35moe 35B.A3B Q4_K - Medium |  19.74 GiB |    34.66 B | MTL,BLAS   |       6 |     2048 |   tg256 @ d8192 |         22.56 ± 0.36 |
      | qwen35moe 35B.A3B Q4_K - Medium |  19.74 GiB |    34.66 B | MTL,BLAS   |       6 |     2048 | pp2048 @ d16384 |        313.67 ± 0.63 |
      | qwen35moe 35B.A3B Q4_K - Medium |  19.74 GiB |    34.66 B | MTL,BLAS   |       6 |     2048 |  tg256 @ d16384 |         20.45 ± 0.04 |
    
    I sure do want that silicon now haha.