DeepSeek kicks off 26 with paper signalling push to train bigger models for less(www.scmp.com)

18 pointsby kseca month ago2 comments

edflsafoiewqa month ago
Link: https://arxiv.org/abs/2512.24880
NitpickLawyera month ago
Jesus. Why do people that clearly don't understand this field insist on writing on the subject?
> DeepSeek kicks off 2026 with paper signalling push to train bigger models for less
> DeepSeek has published a technical paper co-authored by founder Liang Wenfeng proposing a rethink of its core deep learning architecture
Both the title and the first paragraph are completely and unambiguously wrong.
While the method improves stability (preventing training collapse), it technically increases the computational cost per step rather than reducing it. The benefit is reliability, not raw cost reduction. (page 4 > "mHC supports training at scale and introduces only a 6.7% additional time overhead")
Secondly the proposed mHC is an extention of HC, and while cool, it's nowhere near a "rethink of its core architecture". If proven beyond the small models they tried (27B models), this method fixes some instability issues, but the "core" architecture stays the same.
- ranyumea month ago
  Personally I'm interested in the prospect of enabling ways to change the learning process of a model based on topological structures.