1 pointby smoothyy5 hours ago1 comment

emanuele-em3 hours ago
Smart call on the tiered lookup, hitting SQLite first and falling back to FLOPs/TFLOPS estimation. One thing I'm wondering about the 20% overhead in Tier 2, does that factor in framework overhead or just raw model weights? That margin can vary a lot depending on whether you're running PyTorch vs ONNX.
- smoothyy2 hours ago
  The 20% is a safety margin on the memory fit check only. it sits on top of the raw weights-only figure (params × bytes-per-precision) to account for KV cache and activation tensors, not framework differences specifically. Your point is valid but i think it applies to a different layer. PyTorch vs ONNX overhead is real, but it's implicitly captured in the throughput path. Tier 2 scales from real-world benchmarks that already reflect whatever framework ran them. The 20% is intentionally conservative: it'll occasionally say a model won't fit when it technically could, but it won't tell you something fits and then OOM you.