I built a benchmark to measure Readiness—the actual time until a container can serve an HTTP request, rather than just pull time. The results were surprising. While lazy-pulling (eStargz/FUSE) made pulls 65x faster, it made the application's first successful response 20x slower compared to a local registry full-pull.
Why? Because lazy-pulling doesn't remove the cost of downloading bytes; it just shifts it to the runtime. Your registry becomes a runtime dependency, and every uncached import torch becomes a network round-trip. In my latest post, I dive deep into:
- The OCI file format limits (DEFLATE chains) that make this hard.
- Why containerd’s snapshotter is the bottleneck.
- The operational risks of FUSE on your GPU nodes.