justsomeguy19964 hours ago
I built a Python implementation of Google's TurboQuant paper (ICLR 2026) for vector search. The key thing that makes this different from PQ and other quantization methods: it's fully data-oblivious. The codebook is derived from math (not trained on your data), so you can add vectors online without ever rebuilding the index. Each vector encodes independently in ~4ms at d=1536.
The repo reproduces the benchmarks from Section 4.4 of the paper — recall@1@k on GloVe (d=200) and OpenAI embeddings (d=1536, d=3072). At 4-bit on d=1536, you get 0.967 recall@1@1 with 8x compression. At 2-bit, 0.862 recall@1@1 with ~16x compression.
Paper: https://arxiv.org/abs/2504.19874