1 pointby djhemath2 hours ago1 comment
  • djhemath2 hours ago
    This paper by the Kimi team allows us to add more depth to the model without losing information/context. Although it increases efficiency by just over 1%, the total savings could reach millions. Or at least, it would allow us to build models with more layers for the same cost as today.