11 pointsby lastdong8 hours ago4 comments

kgeist6 hours ago
I think the headline is misleading. It's some random fork of llama.cpp, I can't find evidence that TurboQuant was actually added to llama.cpp proper.
The only legit PR I can find is this [0] and it's still open.
There's currently a lot of rejected vibe-coded PRs: [1] (violation of AI policy).
The OP's PR says it was generated with Claude Code so it has a very low chance of getting merged upstream.
[0] https://github.com/ggml-org/llama.cpp/pull/21089
[1] https://github.com/ggml-org/llama.cpp/pulls?q=Turboquant+is%...
- lastdong2 hours ago
  Indeed, thanks for pointing this out and the links. With the excitement I misread that it was an MR from the fork to the main project. I don’t think I’m able to fix the title though.
  I find it quite exciting to read some results in an effort to understand if TurboQuant main ideas can be applied to model weights. There are other similar projects, so we’ll see, but it seems some of this fork results look promising.
pogue4 hours ago
I see mentions showing it reduced the size of the models but not how much memory was saved. I guess it depends on how it's used? But I would be very curious to see some benchmarking for that.
jsilence7 hours ago
Great news! Expecting this to get implemented in all the major inference runners pretty fast. See also: https://news.ycombinator.com/item?id=47637422
lastdong8 hours ago
Cuda support added. Also see https://news.ycombinator.com/item?id=47562135#47635952