3 pointsby ani172 hours ago2 comments
  • ani172 hours ago
    Author here. I wanted to understand what vLLM and llama.cpp are actually doing under the hood, but the codebases are massive. So I wrote a stripped down version from scratch to see the core ideas without the production complexity.

    Code: https://github.com/Anirudh171202/WhiteLotus

  • lazyMonkey692 hours ago
    I think the paged attention part is a bit oversimplified. Nice read otherwise!