Hacker News
new
top
best
ask
show
job
Show HN: I reduced LLM inference GPU calls by 94% using semantic routing
(
icomnewtechnologies.com
)
2 points
by
kanacki
8 hours ago
1 comment
slach
4 hours ago
better publish it on github