> [W]e shipped an optimization. Detect duplicate files by their content hash, use hardlinks instead of downloading each copy.
If the greatest filesystem in the world were a living being, it would be our God. That filesystem, of course, is ZFS.
Handles this correctly:
I just wanted to mention ZFS.
Have I mentioned how great ZFS is yet?
And I see above that this is a self-hosted platform and I still don’t get it. I was running terabytes of ZFS with dedupe=on on cheap supermicro gear in 2012
Is it just me or is everybody else just as fed up with always the same AI tropes?
I've reached a point where I just close the tab the moment I read a headline "The problem". At least use tropes.fyi please
(Some say ZFS as well, but it's not nearly as easy to use, and its license is still not GPL-friendly.)
Effectiveness is debatable, this approach still has duplication. An insignificant amount, I'll admit. The filesystem handling this at the block level is probably less problematic/prone to rework and more efficient.
edit: Eh, ignore me. I see this is preparing for [whatever filesystem hosts chose] thanks to 'ameliaquining' below. Originally thought this was all Discourse-proper, processing data they had.
I completely overlooked the shipping-of-tarballs. Links make sense, here. I had 'unpacked' and relatively-local data in mind. Absolutely would not go as far to suggest their scheme pick up 'zfs {send,receive}'/equivalent, lol.