3 pointsby Kranium20023 hours ago2 comments
  • Kranium20023 hours ago
    CacheShrink is a KV cache compression library I built using Riemannian optimization on Stiefel manifolds to decompose key-value matrices into latent representations and reconstruct them efficiently. It works with HuggingFace and supports multiple attention styles including MHA (Multi-Head Attention) and GQA (Grouped Query Attention), using XKV-style compression for GQA. The approach achieves significant memory reduction with minimal loss in perplexity.
  • bell-cot2 hours ago
    Not an HN staffer...but maybe read this and update your title?

    https://news.ycombinator.com/showhn.html