14 pointsby jonbaer5 hours ago2 comments
  • PaulRobinson3 hours ago
    They made this claim in a peer reviewed paper submitted to Nature, but it’s not clear how peers could evaluate the truth of this claim.

    If it’s true, and the consensus is that we are hitting limits of how to improve these models, the hypothesis that the entire market is in a bubble over-indexed on GPU costs [0] starts to look more credible.

    At the very least, OpenAI and Anthropic look ridiculously inefficient. Mind you, given the numbers on the Oracle deal don’t add up, this is all starting to sound insane already.

    [0] https://www.wheresyoured.at/the-haters-gui/

  • onion2k4 hours ago
    Maybe, if you don't include the >$10m investment in H800 hardware. Still a lot cheaper than competitors though.
    • 48terry2 hours ago
      Yes, if we include a cost they didn't include, the cost would be different.
    • jml7c52 hours ago
      No, their calculation is based on a rental price of $2/hour.
      • yorwba2 hours ago
        Right, but they didn't use rented GPUs, so it's a purely notional figure. It's an appropriate value for comparison to other single training runs (e.g. it tells you that turning DeepSeek-V3 into DeepSeek-R1 cost much less than training DeepSeek-V3 from scratch) but not for the entire budget of a company training LLMs.

        DeepSeek spent a large amount upfront to build a cluster that they can run lots of small experiments on over the course of several years. If you only focus on the successful ones, it looks like their costs are much lower than they were end-to-end.