Hacker News
new
top
best
ask
show
job
Timeline of Diffusion Language Models
(
github.com
)
1 point
by
tilt
12 days ago
1 comment
storystarling
11 days ago
I'm curious what the actual inference unit economics look like compared to standard autoregressive models. Parallel decoding helps with latency, but does the total compute cost per token make it viable for production workloads yet?