Information has been permanently deleted, for small values of permanently(devblogs.microsoft.com)

69 pointsby Tomte2 days ago14 comments

JohnFen2 days ago
Since it's impossible to verify, and there's a whole ton of deception by companies, I simply don't trust that any data is ever deleted just because I've been told it is.
Instead, I try to minimize the amount of data that others have.
- nemothekid2 days ago
  One of the headaches of system design in this area is how do you deal with backups. Lets say you do regular backups to s3, glacier, tape, stone tablets.
  When you tell your customer "we have deleted all your data", are you loading all your backups and scrubbing their data from there as well? Probably not as it would probably be too expensive, and depending on your size, your backups may cease to be backups as users request for data deletion daily.
  Ok, then you might say when you restore backups, you reference all the data to a master "deletion" list and before a backup is restored then you reference that deletion list. Still though you are dependent on the company promising to reference the deletion list. When someone "really really" needs data from a week ago and gets a one off backup, it has deleted customer data in it.
  Next, my idea was you encrypt all user data with a user specific key, and when the customer requests to delete that, you just delete the key. Perfect. Up until it's time to backup the encryption keys database and you are back at square one. I understand this solution is probably 95% of the way there but if anyone knows any designs which, when implemented, are fairly foolproof and don't require your customers to be "smart" (e.g. have them hold the encryption key).
  - jimkoen2 days ago
    This is why the GDPR is so nice. A company needs a strategy in place to make sure your data is actually deleted and that strategy needs to be verified to work. Purging backups of records to be deleted upon reimport is fine, but you better make sure that process works, else the person who's data you accidentally didn't delete has a case against you in court.
    trinix9122 days ago
    The thing is people generally aren't going to court when they suddenly notice a deleted photo still in their cloud docs. They're going to think it's a glitch or that they hadn't deleted it. Proving something like this in court is tricky - how would you prove to the judge you deleted something and that it randomly reappeared after some years?
  - bestouff2 days ago
    It looks trivial to me. Encrypt your backups.
    nemothekid2 days ago
    Not sure how encrypting backups solves the problem stated. If I hold the encryption keys, that means I still have access to your data.
- sidewndr462 days ago
  It's not really any different when a court orders a company to delete data that weren't supposed to have. The judge pretends he understands the plaintiffs arguments and knows what the data is and where it is stored. Later the judge tells the defendant to delete it. The defendant pretends to delete it and the court pretends to verify that something doesn't exist. Afterwards, everyone goes back to business as usual.
- outworlder2 days ago
  If you can, edit the data first, don't delete.
  Not all systems keep all versions indefinitely.
- Veserv2 days ago
  Yep, until they make a legally binding claim supported by a self-imposed liquidated damages clause for failure to abide by their own claims their claims should be ignored.
- quantified2 days ago
  Did you ever sign up for 23andMe?
  - JohnFena day ago
    I did. It was against my better judgement, but I did it anyway as a favor for a family member. I regret making an exception for that, and won't make that mistake again.
    That family member has since apologized to me for asking me to do it, too.
lrvick2 days ago
The only way I would believe proof of deletion is if my data was submitted, end to end encrypted, to a key only held in memory of a quorum of remotely attestable secure enclaves deterministically built from publicly available code that I can easily confirm has no means to export keys to the control of any individual.
This is not only possible, I designed and open sourced a lot of tooling to do it and a few companies are doing this today. Shameless plug: My company (https://distrust.co) provides consulting for orgs that want to be ahead of the pack to retrofit their existing infrastructure to support these types of assurances.
Now we just need to require verifiable deletion techniques like this in order to get a standardized privacy certification browsers can verify and alert users to along the lines of the TLS green lock.
I give it 20 years.
- 3s2 days ago
  We built something similar using secure e enclaves at Tinfoil for verifiably private AI! Unless there is proof of no data access / retention we cannot trust what happens to our data (see recent openAI court ordered retention)
  - lrvicka day ago
    I think the biggest missing bit in Tinfoil is lack of full source bootstrapped deterministic builds. That is an absolute requirement to ensure no single member of the supply chain, such as a single Debian maintainer or a Tinfoil release engineer cannot tamper with the image.
    Also there is the issue that the debian and ubuntu packages you rely on can change from one day to the next etc.
    I went down that road for over a year, building a whole package.json style hash locking system on top of apt only to abandon it realizing no existing Linux distribution was up to the task from a trust and security perspective. Even a lot of the packages Debian claims are reproducible, like rust, are actually just built from unverifiable binary blobs from the internet. It was a sad realization that the reproducibility of all existing distros has some huge asterisks.
    So my team and I at Distrust started StageX to be the first container native Linux distribution and the first that trusts no single human or system, now at the heart of enclaves at Mysten Labs, Turnkey, etc. Totally FOSS though donations or support contracts are always welcome.
    Took a look at your image generation setup and it could certainly be ported to stagex to have a completely verifiable, deterministic, and tamper evident supply chain.
    https://stagex.tools
    By all means reach out if you want help! Not many of us working on this sort of thing.
- pluto_modadic2 days ago
  do you provide datacenter attestation primitives (e.g. Intel DCAP (only newer chips), ARM CCA-SSE(still being built), or AMD trust zone verifications or whatever they're called)?
  software attestable enclaves are one thing. hardware attestable ones are quite another.
  - lrvicka day ago
    We implemented Nitro attestation first while I was at Turnkey for QuorumOS as that is an AWS stack however IMO TPM2 is the way to go today for the most universal support now that the big 3 cloud providers all offer endorsement key APIs. You could then support CPU-unique attestations where possible on top provided you have an out of band source of truth for expected CPU certificates.
    One of our projects at Distrust is to handle all of the above in a universal library/spec we are working on called Bootproof, which will ship with EnclaveOS for broad hardware/software attestation support out of the box via a tiny rust daemon and client.
clickety_clack2 days ago
*For definitions of “your”, “information”, “permanently” and “deleted”, please refer to one of the dense, poorly worded contracts you implicitly agreed to when you thought about our site.
- aitchnyua day ago
  All poorly worded laws have to be interpreted for maximum benefit to opposing party.
- reverendsteveii2 days ago
  These definitions are subject to not just variance from their common meaning but also unilateral change without notice. Offer not valid in Alaska, Hawaii or Puerto Rico. Your mileage may vary. Do not taunt Happy Fun Ball.
- codeplea2 days ago
  It's probably in the Privacy Policy they just emailed about.
- tempodox2 days ago
  It's less “poorly worded” than finely tuned legalese that gives the company carte blanche.
codeplea2 days ago
>I received a confirmation that said, “Your personal information and items associated with your account have now been deleted. This action is permanent and cannot be reversed.”
By the same logic, wasn't this first email self-contradicting? If your data is gone, how are they emailing you to tell you that your data is gone?
But really, aren't companies legally required to retain a lot of information anyway? Such as invoices needed for tax purposes?
- JohnMakin2 days ago
  With data deletion requests, you sometimes do need a mechanism to keep track of who/what you deleted. This inevitably involves PII. What comes to mind is CCPA requests to delete data from private data brokers - there is an inherent problem that to avoid re-ingesting your data into their system, they need to know what that data is in the first place.
- andrewflnr2 days ago
  > If your data is gone, how are they emailing you to tell you that your data is gone?
  It was still in-memory for the deletion request after they finished the deletion query. It probably stayed in memory after the request finished, too, until the page was re-used. The horror.
SoftTalker2 days ago
It's nothing necessarily nefarious. His email address is still in a mailing list, probably at MailChimp or some other third party that they use for mass emailing. Doesn't mean they still have an "account" or "profile" of personal information for him. Of course, it doesn't mean they don't.
midtake2 days ago
It could just be mailing list negligence. Mailing lists are usually decoupled from the main user db/IAM.
therobot242 days ago
until there's actual enforcement, there isn't the incentive to tell the truth...
It really is sad how much data has been captured and monetized of the average person. It seems like we're only continuing to turn up the heat as we continue to 'boil the frog'.
- arez2 days ago
  I thought that's why we have GDPR and similar laws, so you can enforce it? If the company says it deleted your data but it didn't it's definitely not complying with GDPR
  - ygjb2 days ago
    GDPR requires data to be deleted where feasible. A common area where this falls apart is in backups made of systems implemented prior to GDPR rules, or systems which have not implemented a mechanism to allow user level deletion from backups.
    There is a somewhat accepted pattern here where backup processes are updated to retain a list of users who have requested deletion, and when a restore from backup is performed, before the restored system is brought back online, the data of users who have requested deletion is removed.
    As with many other compliance and governance controls, this is a known pattern, but is subject to review by auditors, and the overall pattern, or the specific implementation of the pattern may not survive a legal test via a complaint by a consumer or regulator.
  - Nextgrid2 days ago
    GDPR can only be enforced by regulators. The bar for a valid complaint is quite high, and a company can lie and essentially remove your grounds for said complaint. And even once you do get a valid complaint in, it'll stay in limbo for years. Noyb has some info on the subject: https://noyb.eu/en/data-protection-day-only-13-cases-eu-dpas...
nitwit0052 days ago
Retaining contact information for legal communication seems a logical exception.
After all, how would they even email you back to tell you they deleted your data, if they deleted all records that include your email address?
andrewflnr2 days ago
I'm leaning toward incompetence on this one. Certainly if they were deliberately keeping his data against his will, it would be stupid to email him about it. The people responsible for deleting the account info probably deleted everything they knew about or had access to, but his email was also in some other database run by marketing or something. Or their databases are just overall horrifically denormalized and inconsistent.
(Of course, sufficiently advanced incompetence is indistinguishably from malice. Hard to say if that's applicable here.)
ecshafer2 days ago
So many databases are set up to "delete" a record by just marking the column is_deleted as true, and the record is not actually deleted. Meaning a lot of deleted data is around on disk somewhere but just ignored in most queries.
kps2 days ago
From the headline I expected a startling breakthrough in physics.
geor9ea day ago
I immediately knew it was Raymond Chen just by the way the headline was worded, and the (microsoft.com).
petercooper2 days ago
The GDPR isn't mentioned, but as one of the more stringent privacy regulation regimes, its 'right to erasure' has all sorts of conditions attached to it where a customer might be told that all of their data has been deleted, but some legally has to be (or can be) stored.
For example, you can store a record that an erased user requested erasure so you can prove it later on if needed in a legal situation (article 17.3.e). Updating such users about legal policies that apply to such retained data may still be subject to would seem rather inane but I could easily believe it existing as a policy at companies adopting a very eager interpretation of the regulations.
- asadotzler2 days ago
  Can you claim to that user it is deleted when it is not just because you're holding onto it for legal reasons? I understand the need or requirement to hold some documents, but I don't understand how companies can lie to users claiming their information was deleted when it was not. IMO, they should be required to inform users what specific items were not deleted and the reasons for that.
  - petercooper2 days ago
    It's semantics, but one man's "lying" is another man's pragmatic, non-legalese customer-facing wording.
    For example "Your personal information has been deleted" versus the potentially much messier truth, which might involve citing the GDPR, mentioning that for accounting reasons you have to maintain their details on invoices, areas of your financial auditing process, that you're maintaining a record of their request to delete the account, and so on and so forth.
    JohnFen2 days ago
    No, it's lying. If they say your data has been deleted without a qualifier that some of it remains undeleted (regardless of the reason), that's just a straight-up lie because their statement is factually untrue and they know it.
    They could tell the truth without going into the specific messy detail.
- xeonmc2 days ago
  “Right to erasure” may mean completely different things depending the kind of government you’re dealing with.
Mystery-Machine2 days ago
Interesting that this comes from a Microsoft employee...
- ghewgill2 days ago
  Raymond Chen happens to have worked for Microsoft for a long time, but he is an institution unto himself.