1 pointby mellosouls7 hours ago1 comment
  • bitkin_dev7 hours ago
    Great breakdown, thanks for writing this up.

    One thing I’m still unclear on: in real production workloads, what ended up being the main bottleneck first — memory bandwidth, KV cache management, or scheduler overhead?

    Curious how much of this showed up only under sustained load versus benchmarks.