Hacker News
new
top
best
ask
show
job
DeepSeek's mHC: Stabilizing Training Divergence from 3,000x to 1.6x
2 points
by
Research_Brief
8 hours ago