3 pointsby Kranium20023 hours ago2 comments

Kranium20023 hours ago
CacheShrink is a KV cache compression library I built using Riemannian optimization on Stiefel manifolds to decompose key-value matrices into latent representations and reconstruct them efficiently. It works with HuggingFace and supports multiple attention styles including MHA (Multi-Head Attention) and GQA (Grouped Query Attention), using XKV-style compression for GQA. The approach achieves significant memory reduction with minimal loss in perplexity.
bell-cot2 hours ago
Not an HN staffer...but maybe read this and update your title?
https://news.ycombinator.com/showhn.html