2 pointsby asqpl5 hours ago1 comment

asqpl5 hours ago
Engineer here – I run a therapeutic AI system for elderly care in assisted living facilities. Entire stack runs locally on Mac Minis (no cloud, no GPUs) for privacy and reliability.
Qwen 3.5's DeltaNet architecture broke our llama.cpp inference (1.5s → 21s). Migrated to Apple's MLX framework and brought it back to 7s while maintaining 100% clinical pass rates.
Happy to answer questions about MLX vs llama.cpp, DeltaNet optimization, or running SLMs on Apple Silicon for production healthcare workloads.