> DeepSeek kicks off 2026 with paper signalling push to train bigger models for less
> DeepSeek has published a technical paper co-authored by founder Liang Wenfeng proposing a rethink of its core deep learning architecture
Both the title and the first paragraph are completely and unambiguously wrong.
While the method improves stability (preventing training collapse), it technically increases the computational cost per step rather than reducing it. The benefit is reliability, not raw cost reduction. (page 4 > "mHC supports training at scale and introduces only a 6.7% additional time overhead")
Secondly the proposed mHC is an extention of HC, and while cool, it's nowhere near a "rethink of its core architecture". If proven beyond the small models they tried (27B models), this method fixes some instability issues, but the "core" architecture stays the same.