Sparse File LRU Cache(ternarysearch.blogspot.com)

37 pointsby paladin31415912 hours ago4 comments

clawsyndicate5 minutes ago
sparse files are efficient but they break NFS quota accounting. we run ~10k pods and found that usage reporting drifts and rehydration latency causes weird timeouts. strict ext4 project quotas ended up being more reliable for us.
uroni2 hours ago
I’ve used this technique in the past, and the problem is that the way some file systems perform the file‑offset‑to‑disk‑location mapping is not scalable. It might always be fine with 512 MB files, but I worked with large files and millions of extents, and it ran into issues, including out‑of‑memory errors on Linux with XFS.
The XFS issue has since been fixed (though you often have no control over which Linux version your program runs on), but in general I’d say it’s better to do such mapping in user space. In this case, there is a RocksDB present anyway, so this would come at no performance cost.
hahahahhaahan hour ago
I am guessing the choice here is do you want the kernel to handle this and is that more performant than just managing a bunch of regular empty files and a home grown file allocation table.
Or even just bunch of little files representing segments of larger files.
avmich7 hours ago
We can talk about even more general idea of saving file space: compression. Ever heard about it used across the whole filesystems?
- praseodym2 hours ago
  Microsoft MS-DOS and Windows supported this in the 90s with DriveSpace, and modern file systems like btrfs and zfs also support transparent compression.
- fh9733 hours ago
  Most compressible file formats are already compressed, and with compression you lose efficient non-sequential IO.
- eeeficus5 hours ago
  You introduce overhead on both read and write without being a better solution to OPs problem.