The architecture page explains how ternary quantization, dynamic sparsity, and mmap layer streaming work together to push models far beyond normal RAM limits.
Happy to answer questions about the implementation or benchmarks.