5 pointsby fatihturker6 hours ago2 comments
  • fatihturker6 hours ago
    Author here.

    The architecture page explains how ternary quantization, dynamic sparsity, and mmap layer streaming work together to push models far beyond normal RAM limits.

    Happy to answer questions about the implementation or benchmarks.

  • MrLey6 hours ago
    This is cool project