1 pointby baruch5 hours ago1 comment
  • baruch5 hours ago
    It is possible to get more tokens out of the same hardware by leveraging fast storage for KVCache, it is especially useful for agentic workloads.