CacheShrink is a KV cache compression library I built using Riemannian optimization on Stiefel manifolds to decompose key-value matrices into latent representations and reconstruct them efficiently. It works with HuggingFace and supports multiple attention styles including MHA (Multi-Head Attention) and GQA (Grouped Query Attention), using XKV-style compression for GQA. The approach achieves significant memory reduction with minimal loss in perplexity.