Apple isn't hashing complete files—they're doing block-level deduplication on encrypted data. They likely split files into chunks (probably 4MB or 16MB blocks, similar to Dropbox) and hash each block independently. When I changed 1 byte in the middle of the file, only the block containing that byte needed to be uploaded. The other 95+ blocks were already on Apple's servers and were deduplicated.
This means Apple's servers maintain an index of which specific encrypted blocks each user possesses, even though they can't decrypt the content. Even with end-to-end encryption, the server knows the "fingerprint" of every 4-16MB chunk of your data. Research has shown that block-level deduplication enables "deduplication attacks" where you can determine if a user has a specific file without breaking encryption by uploading a known file and see if it deduplicates → user has that file and this works even with E2EE because block patterns are observable server-side.
Well-known files (popular software, movies, documents) have predictable block signatures. Even encrypted, these patterns could potentially be identified. "Does user X have file Y?" becomes answerable through deduplication probing without actually decrypting anything.
I'm not claiming Apple is actively exploiting this or that the encryption is broken. The crypto is probably solid. But users aren't informed that block-level metadata is retained and that this metadata can leak information about content despite E2EE. "Permanent deletion" doesn't remove these block fingerprints.
I still plan to complete the 30-day retention test to see if Apple ever purges deleted blocks, but the block-level deduplication revelation suggests they keep this metadata indefinitely for system efficiency. For truly private storage, encryption alone isn't enough—you need encryption that prevents deduplication metadata from forming in the first place.
If it’s still there at a month I’d be surprised and be checking terms of service to see what they commit to.
Remember that Apple’s typical customer is non-technical. Keeping files in case of a catastrophic deletion is safer for their customers.
They want to give the person who calls them up and says “I deleted all my family photos 31 days ago!” A good experience.
I also don’t think you can make that assumption. I’ve worked for many companies where we had recovery tools we didn’t advertise to customers especially since it wasn’t a guarantee that they would work, and they involved manual recovery effort. We didn’t want to just give customers the idea that they could be sloppy and delete their data and depend on us to do a low level database restore.