Hacker News
new
top
best
ask
show
job
Show HN: Tiny-vLLM – high performance LLM inference engine in C++ and CUDA
(
github.com
)
74 points
by
yu3zhou4
5 hours ago
7 comments
yu3zhou4
4 hours ago
README is in my opinion (author here) the most interesting - I wrote it to help others build useful mental model to be able to recreate the project yourself, without need to even read my code
juancn
3 hours ago
Looks interesting, it reminds me of the first llama.cpp, but better documented.
nazgulsenpai
4 hours ago
I love the documentation formatted in lessons. I can't wait to read through it.
dwa3592
2 hours ago
Very nice job on read me.
>>Physically, LLM is a file which contains a lot of float numbers.
aka atoms of the LLM.
cyanydeez
2 hours ago
the universe is just atomic if statments
cookiengineer
2 hours ago
Wanted to add that the author has an amazing blog with lots of interesting papers:
https://jedrzej.maczan.pl/
einpoklum
2 hours ago
It seems the author believes checking the return values of CUDA API calls is not "tiny" enough :-(
harshuljain13
2 hours ago
[dead]