For AMD they just haven't invested as much as Intel and it's indeed out of scope for them. The tech still isn't useless though, there are some kinds of attacks that it blocks.
> Use of cryptographic integrity protection mode of Intel® Total Memory Encryption - Multi-Key (Intel® TME-MK) can provide additional protection against alias-based attacks, such as those outlined in the Battering RAM paper. This feature is available on 5th Generation Intel® Xeon® processors (formerly codenamed Emerald Rapids) and Intel® Xeon® 6 processor family with P-cores (formerly codenamed Granite Rapids).
I guess it depends how you interpret "additional protection". But look at the website. They say none of their attacks work on TDX. Only "Scalable SGX".
However, TME-MK is indeed still vulnerable to other kinds of attacks like replay attacks. It isn't going to be as strong as the original SGX design. Unfortunately, as I explain in my other comment, the original SGX design is a kind of theoretical ideal that expects people to make software redesign efforts to benefit from it and the market just has no stomach for much extra spending on security or privacy right now.
> Furthermore, TDX adds cryptographic integrity via a 28-bit MAC in ECC bits [19, 47].
> While the logical integrity could be bypassed by aliasing between two different TDs, as demonstrated in Section 5, the cryptographic integrity remains robust against simple aliasing attacks. This is because, while an interposer enables replay of the data bits containing the ciphertext, it cannot be used to replay the ECC bits, which store the cryptographic MAC. Replaying both data and ECC bits, while theoretically possible, would require a full-fledged interposer capable of intercepting and replaying the data contents. Such an interposer poses significantly higher engineering challenges.
Even this is only sort of better, in that it isn't actually secure against a truly evil RAM chip: it just happens to be using a feature of the RAM chip that narrowly defeats this particular form of command address override attack... but, though, that's still pretty reasonable, as the only reason this attack could be so cheap to build is because of its limitations.
Thanks!!
The server side also has to be secure for the lock-in to be effective.
FHE does (ok, it's much slower for now).
You can ALWAYS break them, it's just a matter of cost, even assuming they're perfectly designed and have no design/implementation flaws. And they're often not perfectly designed, sometimes requiring no physical hardware tampering.
TEEs make attacker's lives harder. Unless you can find a way to make your interposer invisible and undetectable, the value is limited.
Not surprising - even having 2 DDR5 DIMMs on the same channel compromises signal integrity enough to need to drop the frequency by ~30-40%, so perhaps the best mitigation at the moment is to ensure the host is using the fastest DDR5 available.
So - Is the host DRAM/DIMM technology and frequency included in the remote attestation report for the VM?
We use them during hardware development to look at the waveforms in detail well beyond what is needed to read the bits.
The reason their interposer doesn't work with DDR5 is because they designed it with DDR4 as the target, not because DDR5 is impossible to snoop.
What do they actually look like and are there teardowns that show the analog magic?
It's a thing. It's expensive though. At some point you copy-paste scopes and trigger sync them.
Edit: https://www.teledynelecroy.com/oscilloscope/oscilloscopeseri...
I wonder if these are full sampling scopes. In the past we had Equivalent Time Sampling scope(wideband front end, fast sampling slow rate ADC, a variable delay trigger) and many buses have repeatable test patterns that let you trigger that way. They were always a fairly niche device.
As I understand it, the big idea behind Confidential Computing is that huge American tech multinationals AWS, GCP and Azure can't be trusted.
It is hardly surprising, therefore, that the trustworthiness of huge American tech multinationals Intel and AMD should also be in doubt.
In current year you can't really buy new hardware without secure enclaves[0], be it a phone, a laptop or server. Best you can do is refuse to run software that requires it, but even that will become tough when goverments roll out mandatory software that depends on it.
[0]: unless you fancy buying nerd vanity hardware like a Talos POWER workstation with all the ups and downs that come with it.
11 points by mici 4 days ago
The AWS business is built on isolating compute so IMO AWS are the best choice
I've built up a stack for doing AWS Nitro dev
https://github.com/rhodey/lock.host
With Intel and AMD you need the attestation flow to prove not only that you are using the tech but you need to attest to who is hosting the CPU
With Amazon Nitro always Amazon is hosting the CPU
That such attacks are possible was known from the start. What they're doing here is exploiting the fact that Intel (knowingly!) enabled some hardware attacks on SGX in order to allow enclaves to scale up to much larger amounts of RAM consumed.
Credit/debit cards with chips (EMV) are another proof of existence that hardware-based protection can exist.
> It is not evident that there even is a way to implement the kind of guarantees confidential computing is trying to offer using hardware-based protection only.
Not in the absolute, but in the more than $10 mil required to break it (atomic microscopes to extract keys from CPU gates, ...), and that to break a single specific device, not the whole class.
And the key would not allow you to jailbrake another Xbox.
So at most you might be able to make a PC look like an Xbox, but a PC is more expensive to start with.
So unclear exactly what you have accomplished.
The story here is a little complex. Some years ago I flew out to Oregon and met the designers of SGX. It's a good design and it's to our industries shame that we haven't used it much, as tech like this can solve a lot of different security and privacy problems.
SGX as originally designed was not attackable this way. This kind of RAM interposer attack was anticipated and the hardware was designed to block it by using memory integrity trees, in other words, memory was not only being encrypted by the CPU on the fly (cheap) but RAM was also being hashed into a kind of Merkle tree iirc which the CPU would check on access. So even if you knew the encryption key, you could not overwrite RAM or play games with it. It's often overlooked but encryption doesn't magically make storage immutable. An attacker can still overwrite encrypted data, delete parts, replay messages, redirect your write requests or otherwise mess with it. It takes other cryptographic techniques to block those kinds of activities, and "client SGX" had them (I'm not sure SEV ever did).
This made sense because SGX design followed security best practices, namely, you should minimize the size of the trusted computing base. More code that's trusted = more potential for mistakes = more vulnerabilities. So SGX envisions apps having small trusted "enclaves", sort of like protected kernels, that untrusted code then uses. Cryptography ties the whole thing together. In a model like this an enclave doesn't need a large amount of RAM because the bulk of the app is running outside of the TCB.
Unfortunately, at this point Intel discovered a sad and depressing but fundamental truth about the software industry: our tolerance for taking on additional complexity to increase security rounds to zero, and the enclave programming model is complex. The number of people who actually understand how to use enclaves as a design primitive can probably fit into a single large conference room. The number of apps that used them in the real world, in a way that actually met some kind of useful threat model, I'm pretty sure is actually near zero [1].
This isn't the fault of SGX! From a theoretical perspective, it is sound and the way it was meant to be used is sound. But actually exploiting it properly required more lift than the software industry could give. For example, to obtain the biggest benefits (SaaS you can use without trusting it) would have required some tactical changes to web browsers, changes to databases, changes to how such apps are designed and so on. Nobody tried to coordinate such changes and Intel, being a business, could not afford to wait for a few decades to see if anyone picked up the ball on that (their own software engineering efforts were good as far as they went but not ambitious enough to pull off the vision).
Instead what happened is that potential customers said to them (and AMD): look, we want extra security, but we don't want to make any effort. We want to just run containers/VMs in the cloud and have them be magically secure. Intel looked at what they had and said OK, well, um, I guess we can maybe run bigger apps inside enclaves. Maybe even whole VMs. So they went away and did a redesign, but then they hit a fundamental physics problem: as you expand the amount of encrypted and protected RAM the Merkle tree protecting its integrity gets bigger and bigger. That means every cache miss has to recursively do a tree walk to ensure the data read from RAM is correct. And that kills performance. For small enclaves the tree is shallow and the costs aren't too bad. For big enclaves, well ... the performance rapidly becomes problematic, especially as the software inside expects to be running at full speed (as we are no longer designing with SGX in mind now but just throwing any old stuff into the protected space).
So Intel released a new version gamely called "scalable SGX" which scaled by removing the memory integrity tree. As the point of that tree was to stop bus interposer attacks, they provided an updated threat model that excluded them. The tech is still useful and blocks some attacks (e.g. imagine a corrupted developer on a cloud hypervisor team). But it was no longer as strong as it once was.
Knowing this, they set about creating yet another memory encryption tech called TME-MK which assigns each memory page its own unique encryption key. This prevented the kind of memory relocation attacks the "Battering RAM" interposer is doing. They also released a new tech that is sort of like SGX for whole virtual machines, formally giving up on the idea the software industry would ever actually try to minimize TCBs. Sad, but there we go. Clouds have trusted brands and people aren't bothered by occasional reports of global root exploits in Azure. It would take a step change event to get more serious about this stuff.
[1] You might think Signal would count. Its use of SGX does help to reduce the threat from malicious or hacked cloud operators, but it doesn't protect against the operators of the Signal service themselves as they control the client.
TME-MK thereby doesn't do much against this attack. I mean, I guess it slightly improves one of the attacks in the paper (as Intel's CPU was especially bad with the encryption, using the same key across multiple VMs; AMD did not have this issue), but you can use Battering RAM to just get a ciphertext sidechannel (similar to WireTap).
Like, think about it this way: the real attack here is that, for any given block of memory (and these blocks are tiny: 16 bytes large), the encryption key + tweak doesn't change with every write... this is the same for TME and TME-MK. This means that you can find 16 bytes that are valuable, characterize the possible values, and dump a key.
I gather the data written to DRAM is encrypted when written, and decrypted when read. This hardware screws with the address lines on command, so this encrypted data is read or written from some other RAM location. That allows an external party to overwrite / mutate the cipher block read back.
It's been said several times here if the secured app can detect it's RAM has been changed (eg, by Merkle trees), then the attack doesn't work. So it's not the ability to read the secure apps encrypted data in RAM that matters, you also need the ability to change it.
But surely the attacker must have to change the data into something that makes sense to the secured app. In other words, it must write a cipher block that when decrypted is changed to some known plain text. Surely it can only do that with a key.
If the CPU used the same key for all secure VM's to encrypt RAM, then this makes a little more sense. Just start a malicious VM, have it instruct the hardware bug to re-direct it's reads to a another VM's secured RAM, and it's game over. But that isn't exactly it, because of the requirement to have write access.
I am surprised the CPU uses the same AES key (or a simple derivation of the one base key) for all hosted VM's. I always imagined each hosted VM would get it's own key.
The granularity of the encryption is only 16 bytes, and so you can pretty directly target changing things at a pretty low level. And, as the encryption is deterministic, you can also characterize "this location in memory only ever seems to have three values, and they correspond to these three ciphertexts".
> If the CPU used the same key for all secure VM's to encrypt RAM, then this makes a little more sense. Just start a malicious VM, have it instruct the hardware bug to re-direct it's reads to an another VM's secured RAM, and it's game over. But that isn't exactly it, because of the requirement to have write access.
It isn't quite this, as the address matters for the encryption tweak. To do the attack this way (which is only one way of doing it: the Battering RAM device reactivates all the prior attacks, not just this one devastating one), you have to shut down the VM and boot up the malicious one, and get it aligned to the same place.
But, the key bit you are missing is... just do it in reverse? You boot up the malicious VM, have it write anything you want to write, and then you read it back using the redirect (the goal isn't to alias encrypted pages, it is to alias encrypted pages to unencrypted memory). Now you know what you can write to that location in another VM to get that value.
Let's say I was Google building gmail. What would I put in the 'secure enclave' ?
Obviously the most important thing is the e-mail bodies, that's the real goldmine. And of course the logins / user session management. The SSL stuff naturally, the certificate for mail.google.com is priceless. And clearly if an attacker could compromise the server's static javascript it'd be game over, security-wise.
At that point, is there anything left outside the secure enclave?
I should try and find time to write up the results of the investigations I did back then as lots of research was done into this topic.
There are different threat models. Minimizing redesign costs means just lift'n'shifting into encrypted VMs. You then run cron jobs that connect to them and verify their remote attestations from time to time. This is a very weak approach, but it can help keep out some kinds of nosy neighbours who found a hypervisor/cloud auth exploit like the recent Azure failure, it reduces the number of cloud employees who can hack your stuff, and has a lot of other good benefits. That's why Intel and AMD are focusing on this easier target now. It doesn't provide your users any privacy against the email service provider (ESP), and some cloud employees can still beat you in various ways so it's no protection against a conspiracy e.g. US Gov wants your bytes. But it's got some security value and is a good place to start.
But let's now minimize the size of the trusted computing base (TCB) by doing it with small enclaves instead of VMs. There can be networks of cooperating enclaves, that's OK. Thinking about email for a second, the design space is huge and this is an HN comment not a design doc. I will simplify aggressively to save space so we're going retro: IMAP, S/MIME and LDAP. In other words, we start from the basic infrastructure of an end-to-end encrypted email system. Passkeys can be used to set up a key pair that's backed up for the user. We assume slightly upgraded email clients from trusted suppliers that aren't the ESP, like Apple, Microsoft etc. We can then use enclaves to restore features that users expect from hosted email like server-side search and spam filtering. This is just an example of how to design enclave architectures, I'm not trying to show solutions to every threat at once.
We will compromise on auth; identity will be supplied by the ESP. If a user wishes to take the ESP's LDAP servers out of the TCB they can exchange S/MIME certificates out of band to verify they match, or upload them to a variety of providers that are then all checked, use certificate transparency logs, etc. (by "the user can" I mean their upgraded email client implements these workflows).
So what can we move out of the TCB when using SGX instead of SEV or TDX? The OS, obviously. The database, message queuing. Primary database keys are plaintext and sorted (there are ways to fix this leak), values are encrypted under an enclave key protected with AES-GCM. The value contains the hashed key and column name so we can verify the untrusted world gave us the right value to match a key. There are schemes that extend this to block replay attacks, but they're too large to fit in this margin. What else? We can move out TCP, TLS, IMAP. In other words, enclaves will be treated as semi-pure functions that have encrypted attestable memory spaces and an ability to derive their own private keys. They won't be making system calls or running full blown containers. The software running inside the enclave is designed for enclaves.
First problem: how can we do indexing or spam filtering if the email is encrypted under a key the server doesn't have? The answer is obvious: the user uploads their private key to the enclaves! The process is: do a remote attestation with e.g. the spam filtering enclave, as part of which you learn the hash and/or signing key of the code running inside it. You also get a certificate chain showing the enclave is code signed by a trusted security firm that's audited the enclave source code for vulns, ensured it doesn't leak mail and so on. The email client verifies the RA protocol and certificate chain, becoming convinced in the process that the memory space of the spam filtering engine is secure and it will obey the social contract. Having done that it then uploads the user's private key to the enclave in an encrypted message tunnelled through TLS (two layers of encryption), the enclave then re-encrypts it and requests the untrusted world to store it to disk in the database. Now when an email is delivered the delivery and queueing infrastructure (all outside the TCB) hands it off to an (untrusted) spam server which loads the enclave into RAM, gives it the S/MIME message, loads the user's encrypted private key from the database, gives it spam scores for visible metadata like sending domain, hands all that off to the enclave and requests a classification. The enclave decrypts the user's private key, decrypts the S/MIME message, runs some classification on it and returns the result.
For indexing it works similarly except the enclave has to encrypt the posting lists. If you want to get really fancy you have to hide access patterns as well to avoid statistical inference of likely email contents by looking at how popular certain posting lists are and such; it gets complicated fast but encrypting whole VMs doesn't actually block such attacks so you have to bite the bullet anyway.
Notice how nearly everything about this system runs outside enclaves, yet, the ESP still can't read your mail. You do need help from third parties - someone has to write your upgraded email client, someone has to audit the enclave code the ESP runs, someone has to check the ESP isn't advertising the wrong public key for your username. But this is all quite tractable.
Am I impacted by this vulnerability?
For all intents and purposes, no.
Battering RAM needs physical access; is this a realistic attack vector?
For all intents and purposes, no.
It depends on the threat model you have in mind. If you are a nation state that is hosting data in a US cloud, and you want to protect yourself from the NSA, I would say this is a realistic attack vector.
Happy to revisit this in 20 years and see if this attack is found in the wild and is representative. (I notice it has been about 20 years since cold boot / evil maid was published and we still haven't seen or heard of it being used in the wild (though the world has kind of moved onto soldered ram for portable devices).
* They went to great lengths to provide a logo, a fancy website and domain, etc. to publicise the issue, so they should at least give the correct impression on severity.
It requires only brief one-time physical access, which is realistic in cloud environments, considering, for instance:
* Rogue cloud employees;
* Datacenter technicians or cleaning personnel;
* Coercive local law enforcement agencies;
* Supply chain tampering during shipping or manufacturing of the memory modules.
This reads as "yes". (You may disagree, but _their_ answer is "yes.")Consider also "Room 641A" [1]: the NSA has asked big companies to install special hardware on their premises for wiretapping. This work is at least proof that a similar request could be made to intercept confidential compute environments.
Ah yes, so I bet all these companies that are or were going to use confidential cloud compute aren't going to now, or kick up a fuss with their cloud vendor. I'm sure all these cloud companies are going to send vulnerability disclosures to all confidential cloud compute customers that their data could potentially be compromised by this attack.
First we should be careful in what I said; I never said physical access is unrealistic and certainly didn't say this attack is not viable*. What I am saying is that this is not a concern outside a negligible amount of the population. They never will be affected as we have seen with the case of Cold Boot, and all the other infeasible fear mongering attacks. But sure, add it to your vulnerability scanner or whatever when you detect SGX/etc.
But why should this not be a concern for an end user that may have their data going through cloud compute or a direct customer? It comes down to a few factors: scale, insider threats and/or collusion, or straight up cloud providers selling backdoored products.
Let's go in reverse. Selling backdoored products is an instant way to lose goodwill, reputation, lose your customer base, with little to no upshot if you succeed in the long term. I don't see Amazon, Oracle, or whoever stooping this low. A company with no or low reputation will not even make a shortlist for CCC (confidential cloud compute).
Next is insider threats. Large cloud providers have physical security locked down pretty tight. Very few in an organisation know where the actual datacentres are. Cull that list by 50% for those that can gain physical access. Now you need to have justification for why you need access to the physical machine (does the system have failed hardware or bad RAM) that you need to target **. And so on and so forth. Then there is physical monitoring of capturing a recording of you performing the act and the huge deterrent of not losing your cushy job and being sentenced to prison.
Next collusion: so we consider a state actor/intelligence community compelling a cloud provider to do this (but it could be anyone such as an online criminal group or a next door neighbour). This is too much hassle and headache in which they would try to get more straightforward access. But the UK for example, after exhausting all other ways of getting access data to a target, could supply a TCN to a cloud provider to deploy these interposers for a target, they would still need to get root access to the system. Reality is this would be put in the too hard basket; they would probably find easier and more reliable ways to get the data they seek (which is more specific than random page accesses).
Finally I think the most important issue here is scale. There's a few things I think about when I think of scale: first is the populous that should generally be worried (which I stated earlier is a negligible amount). There's the customers of CCC. Then there's the end users that actually use CCC. There's also the number of how many interposers can be deployed surreptitiously. At the moment, very few services use CCC, the biggest is Apple PCC and WhatsApp private processing for AI. Apple is not vulnerable for a few reasons. Meta does use SEV-SNP, and I'm sure they'd find this attack intriguing as a technically curiousity, but won't change anything they do as they're likely to have tight physical controls and separate that with the personnel that have root access to the machines. But outside of these few applications which are unlikely to be targetted, there's nascent use of CCC, so there's negligible chance the general public will be even exposed to the possibility of this attack.
I've ignored the supply chain attack scenario which will be clear as you read what follows.
A few glaring issues with this attack:
1. You need root on the system. I have a cursory understanding of the threat model here in that the OS/hypervisor is considered hostile to SGX, but if you're trying to get access to data and you control the OS/hypervisor, why not just subvert the system at that level rather than go through this trouble?
2. You need precise control of memory allocation to alias memory. Again, this goes back to my previous point, why would you go to all this trouble when you have front door access.
(Note I eventually did read the paper, but my commentary based on the website itself was still a good indicator that this affects virtually noone.)
* The paper talks about feasibility of the attack when they actually mean how viable it is.
** You can't simply reap the rewards of targeting a random machine, you need root access for this to work. Also the datacentre technicians at these cloud companies usually don't have the information apriori of which customer maps to which physical server.