13 pointsby ksec8 hours ago2 comments
  • NitpickLawyer7 hours ago
    Jesus. Why do people that clearly don't understand this field insist on writing on the subject?

    > DeepSeek kicks off 2026 with paper signalling push to train bigger models for less

    > DeepSeek has published a technical paper co-authored by founder Liang Wenfeng proposing a rethink of its core deep learning architecture

    Both the title and the first paragraph are completely and unambiguously wrong.

    While the method improves stability (preventing training collapse), it technically increases the computational cost per step rather than reducing it. The benefit is reliability, not raw cost reduction. (page 4 > "mHC supports training at scale and introduces only a 6.7% additional time overhead")

    Secondly the proposed mHC is an extention of HC, and while cool, it's nowhere near a "rethink of its core architecture". If proven beyond the small models they tried (27B models), this method fixes some instability issues, but the "core" architecture stays the same.

    • ranyume6 hours ago
      Personally I'm interested in the prospect of enabling ways to change the learning process of a model based on topological structures.