llama-server \
--model /mnt/ubuntu/models/llama-cpp-qwen/Qwen3.6-35B-A3B-UD-Q4_K_XL.gguf \
--ctx-size 150000 \
--n-gpu-layers 99 \
--cache-type-k q8_0 \
--cache-type-v q8_0 \
--parallel 3 \
--kv-unified \
--ctx-checkpoints 32 \
--checkpoint-every-n-tokens 8192 \
--checkpoint-min-tokens 64 \
--flash-attn on \
--batch-size 4096 \
--ubatch-size 1024 \
--reasoning on \
--temp 0.6 \
--top-p 0.95 \
--top-k 20
I was wondering if turboquant is worth the effort right now, but I'm not yet seeing it speed wise.checkpoint-min-tokens is a local patch I have so that small background tasks don't wreck my checkpoint cache.