18 pointsby pythongiant3 hours ago6 comments
  • stpedgwdgfhgddan hour ago
    I just dont get why people choose Python and not e.g. Go for high performance problems.
    • larme10 minutes ago
      Go is not high performance enough. Like what others said, you implement the high performance part in C++ and use python to glue them.
    • Yoric37 minutes ago
      Go is pretty good at performance, but pretty bad at expressing domain-specific logics. Python is the opposite, but once you have isolated the parts that need to be optimized, it's quite easy to rewrite them in a native language (in particular, the Rust-Python bindings are really good, although in this project, it's C++).
    • sigmoid10an hour ago
      Python is a very convenient skeleton for gluing together high performance modules that were written in C or cuda. Writing boilerplate code in those to adapt them to your project is much more inconvenient.
  • hexnuts2 hours ago
    Bad site design, if I can't scroll to see the next slide, that's just broken.
  • x0rumanan hour ago
    The functionality is impressive, but the website needs some work
  • sakexan hour ago
    Is this based on paged attention with hashing of the pages?
  • pythongiant3 hours ago
    KVBoost is a chunk-level KV cache reuse library for HuggingFace models (pip install kvboost). It supports two recompute strategies (selective boundary and CacheBlend), int8/int4 KV quantization for 2–4x RAM reduction, disk-backed cold storage, and 11 architectures including Llama, Qwen, Gemma, Mistral, and Phi. On Qwen2.5-3B we measured 47.9x TTFT speedup on an 8-turn conversation, 21x on code context reuse, 100–743x faster than MLX, and 3–41x faster than vLLM-MLX — including interior chunk reuse where vLLM gets zero hits. Outputs are token-for-token identical to baseline under greedy decoding. Works best on 3B+ models with 500+ token shared context. GitHub: https://github.com/pythongiant/KVBoost
  • 3 hours ago
    undefined