Hacker News
new
top
best
ask
show
job
Show HN: Running Gemma-4 26B at 124 tokens/SEC on a CPU, no GPU
(
apeg.dev
)
10 points
by
arun-prasath
6 hours ago
1 comment
pmb_developer
3 hours ago
The output head byte budget is surprising. Did you try any tradeoff where the head is compressed more aggressively but experts stay mostly untouched?