5 pointsby fatihturker6 hours ago2 comments

fatihturker6 hours ago
Author here.
The architecture page explains how ternary quantization, dynamic sparsity, and mmap layer streaming work together to push models far beyond normal RAM limits.
Happy to answer questions about the implementation or benchmarks.
MrLey6 hours ago
This is cool project