4 pointsby GreenGames5 hours ago1 comment
  • emanuele-em5 hours ago
    Really cool to see someone actually prove that the NVIDIA vs Apple efficiency gap is mostly a software problem. A 2020 GPU matching M5 Max tok/J at 1.8x the throughput just by fusing all 24 layers into one persistent kernel is a strong result. The DVFS sweep losing only 5% between 420W and 220W is surprising. Have you looked at what this would take on Hopper with TMA?