1 pointby arjmandi2 hours ago1 comment
  • arjmandi2 hours ago
    LLM inference performance is governed by three competing bottlenecks: compute time, memory bandwidth, and communication latency. In this post, we've covered what allows full hardware utilization and key constraints.