125 pointsby simonpure7 months ago11 comments
  • jimmySixDOF7 months ago
    Little sparse on the documentation side can't tell at a glance if there is a 1:1 hyperperameter tuneability or if this is an opinionated single path locked soft fpga eval-hacking kind of thing.

    EDIT: -- Ok, it's legit, here is an example of it put to use by the makers of the Dolphin OpenSource series of FineTunes:

    > Here I implement in nano-vllm, efficient sample-K logit extraction, as described in "Sparse Logit Sampling: Accelerating Knowledge Distillation in LLMs" by Anshumann et. al. Sampling occurs on the GPU, the non-sampled logits do not get copied out of GPU space. I tried to implement this in @vllm_project, but it was a bit too heavy for me to figure out.

    https://github.com/GeeeekExplorer/nano-vllm/pull/34

  • omneity7 months ago
    This is an incredible achievement for a solo developer. The dev is from the Deepseek team by the way.
  • tt7262597 months ago
    After seeing the Docker image for vllm jump +5Gb (to 10Gb!) over the past five months, I grew suspicious of vllm's development practices [1]. It's not easy, for sure, to deal with all those flaky python modules [2].

    But having the CUDA packages four times in different layers is questionable! [3]

    Yet again, as a college mate of mine used to say, "Don't change it. It works."

    --

    [1]: https://hub.docker.com/r/vllm/vllm-openai/tags

    [2]: https://github.com/vllm-project/vllm/issues/13306

    [3]: These kinds of workarounds tend to end up accumulating and never get reviewed back:

    - https://github.com/vllm-project/vllm/commit/b07d741661570ef1...

    - https://github.com/vllm-project/vllm/commit/68d37809b9b52f4d... (this one in particular probably accounts for +3Gb)

  • unwind7 months ago
    Meta: the Title Casing in the title is pretty obnoxious, "Vllm" is exactly the inverse, casing-wise, of how the project spells its name.
    • msephton7 months ago
      Fwiw op has a small window of time to correct the casing after posting
  • 7 months ago
    undefined
  • mountainriver7 months ago
    Love this project, we need more simplifications like this in the current ML environment
  • zackify7 months ago
    Will this end up getting an open ai compatible web server or is that out of scope.
  • fractorial7 months ago
    Did anyone else click in excitedly after misreading ‘Vllm’ as ‘LLVM?’
  • baalimago7 months ago
    So... It's a language model..? As in, not "large"? I'm a bit unsure of the magnitudes here, but surely "nano" and "large" cancel out
  • futurecliff7 months ago
    how did u do it? which portion of vllm refactoring allowed u to get such gains.
  • b0a04gl7 months ago
    [dead]