Hacker News
new
top
best
ask
show
job
DeepSeek's mHC: Stabilizing Training Divergence from 3,000x to 1.6x
2 points
by
Research_Brief
15 days ago