3 pointsby teleforce5 hours ago1 comment
  • teleforce5 hours ago
    >This repository provides a patch for SGLang and vLLM that enables IndexCache inference acceleration for models using DeepSeek Sparse Attention (DSA), including DeepSeek-V3.2 and GLM-5.

    Paper here [1].

    [1] IndexCache: Accelerating Sparse Attention via Cross-Layer Index Reuse:

    https://arxiv.org/abs/2603.12201