Hacker News
new
top
best
ask
show
job
High-Fidelity KV Cache Summarization Using Entropy and Low-Rank Reconstruction
(
jchandra.com
)
3 points
by
jchandra
3 hours ago
2 comments
vivahir215
2 hours ago
Interesting Approach. Curious about the latency tradeoff: OLS + SVD are much heavier than Top-K.Have you benchmarked end-to-end inference latency?
jchandra
2 hours ago
[dead]
jchandra
2 hours ago
[dead]