Qwen 3.5's DeltaNet architecture broke our llama.cpp inference (1.5s → 21s). Migrated to Apple's MLX framework and brought it back to 7s while maintaining 100% clinical pass rates.
Happy to answer questions about MLX vs llama.cpp, DeltaNet optimization, or running SLMs on Apple Silicon for production healthcare workloads.