This primitive we're trying to introduce is meant to make up for this shortcoming without having to introduce additional rules in the standard.
I'm not exposed to this space very often, so maybe you or someone else could give me some context. "Sabotage" is a deliberate effort to ruin/hinder something. Are compiler engineers deliberately hindering the efforts of cryptographers? If yes... is there a reason why? Some long-running feud or something?
Or, through the course of their efforts to make compilers faster/etc, are cryptographers just getting the "short end of the stick" so to speak? Perhaps forgotten about because the number of cryptographers is dwarfed by the number of non-cryptographers? (Or any other explanation that I'm unaware of?)
CPUs love to do branch prediction to have computation already performed in the case where it guesses the branch correctly, but cryptographic code needs equal performance no matter the input.
When a programmer asks for some register or memory location to be zeroed, they generally just want to be able to use a zero in some later operation and so it doesn’t really matter that a previous value was really overwritten. When a cryptographer does, they generally are trying to make it impossible to read the previous value. And they want to be able to have some guarantee that it wasn’t implicitly copied somewhere else in the interim.
A lot of software engineer are seeing this as compiler engineer only caring about performance as opposed to other aspect such as debuggability, safety, compile time and productivity etc... I think that's where the "sabotage" comes from. Basically the focus on performance at the detriment of other things.
My 2 cents : The core problem is programmers expecting invariant and properties not defined in the languange standard. The compiler only garanty things as defined in the standard, expecting anything else is problematic.
Yes, languages do lack good mechanisms to mark variables or sections as needing constant-time operation ... but compiler maintainers could have taken the view that that means all code should be compiled that way. Now instead we're marking data and section as "secret" so that they can be left unoptimized. But why not the other way around?
I understand how we get here; speed and size are trivial to measure and they each result in real-world cost savings. I don't think any maintainer could withstand this pressure. But it's still deliberate.
Worse cost-benefit tradeoff, perhaps? I'd imagine the amount of code that cares more about size/speed than constant-time operation far outnumbers the amount of code which prioritizes the opposite, and given the real-world benefits you mention and the relative newness of concerns about timing attacks I think it makes sense that compiler writers have defaulted to performance over constant-time performance.
In addition, I think a complicating factor is that compilers can't infer intent from code. The exact same pattern may be used in both performance- and timing-sensitive code, so absent some external signal the compiler has to choose whether it prioritizes speed or timing. If you think more code will benefit from speed than timing, then that is a reasonable default to go with.
I would argue that given a certain ISA, it's probably easier to write an autocomplete extension for assembly targeting that ISA, rather than autocomplete for C, or goodness forbid, C++.
Likewise for structs, functions, jump targets, etc. One could probably set up snippets corresponding to different sorts of conditional execution—loops, if/else/while, switch, etc.
Any side effect is a side channel. There are always going to be side channels in real code running on real hardware.
Sure you can change your code, compiler, or, or even hardware to account for this but at it's core that is security by obscurity.
https://www.intel.com/content/www/us/en/developer/articles/t...
Sure, you could run on some hypothetical OS that supports DOITM and insert syscalls around every manipulation of secret data. Yeah, right.
The whole design is ridiculous.
This could be done using an opcode prefix, which would bloat code but would work perfectly. Or it could use an RFLAGS bit or a bit in MXCSR or a new register, etc.
Almost anything would be better than an MSR that is only accessible to privileged code.
> Almost anything would be better than an MSR that is only accessible to privileged code.
ARM does that: their flag (DIT) is accessible by non-privileged code. If you know the architecture has that flag, either because your -march= is recent enough or because the operating system told you so through the hwcaps or the emulated id registers, you can use it freely without needing to switch to privileged mode through a syscall.
> for i = 1 to len(real_password) {
> if entered_password[i] != real_password[i] {
> return FAILURE
> }
> }
>
> return SUCCESS
OK now an alert attacker with the ability to very accurately record the time it takes to check the password can determine the length at least of the real password, because the time complexity of this check is O(length of the real password), and they could also gradually determine the password itself because the check would take longer as the attacker got each successive character correct.Taking this general idea and expanding it, there are lots of places where the timing of branches of code can leak information about some secret, so in cryptographic code in particular, it’s often beneficial to be able to ensure that two branches (the success and failure branches in the above) take exactly the same amount of time so the timing doesn’t leak information. So to fix the above you would probably want to do two things. Firstly set a boolean to failure and still continue the checking to ensure the “return failure quickly” problem doesn’t leak information and also change your password check to check against a fixed-width hash or something so the length of the password itself wasn’t a factor.
The problem is lots of performance optimizations (pipelining, branch prediction etc) work specifically against this goal- they aim to take branches quickly in the happy path of the code because normally that’s what you want to ensure optimal performance.
So say instead of the above I do
> bool status = SUCCESS
> for i = 1 to hash_length {
> if hash_of_entered_password[i] != hash_of_real_password[i] {
> status = FAILURE
> }
> }
>
> return status
…I don’t want the optimizer to realize that when status becomes FAILURE it can never become SUCCESS again and the loop doesn’t do anything else so just return early. I want it to actually run the pointless comparison of the rest of the hash so the timing is exactly the same each time.But now my check is constant time but I’ve shifted the burden onto the person who writes the hash function. That has to run in constant time or my check will once again leak. So in general people want the ability to tell the compiler that they want a particular piece of code to run in constant time. At the moment, in the general case I think you have to break into inline assembly to achieve this.
match = True
for a, b in pad(entered_pass, real_pass):
match = match and a == b
return match
Then it will be faster as well. I was surprised to find this.Obviously this doesn't mitigate power usage side channel attacks, but that's not the point here.
It's time-bound, so let's check time.
We should be asking our CPU vendors to support enabling a constant time mode of some sort for sensitive operations.
For an example of a list of such instructions see:
https://www.intel.com/content/www/us/en/developer/articles/t...
However, cooperation from the operating system is necessary, as the constant-time execution mode may need to be enabled by setting certain CPU-control bits in protected registers (e.g. IA32_UARCH_MISC_CTL[DOITM]).
See for instance:
https://www.intel.com/content/www/us/en/developer/articles/t...
CMOV is on the list of instructions with constant-time execution, but the list is valid only with the corresponding control bit set correctly.
The way ARM does this is way better, since it doesn't need help from the operating system: user-space can directly set and clear the DIT bit. Operating system cooperation is necessary only to know whether that bit exists (because the ID registers are not directly readable by user mode).
That said WG21 and WG14 don't seem to be able to get the memo that safety is more important than single core speed. Or as I suspect a bunch members are actually malicious.
Technically any new feature that requires backend support is an additional burden on backend devs. There's nothing special about constant-time builtins in this respect.
> since if they implemented it naively
Strictly speaking, whether an implementation is naive is independent of whether it is correct. An implementation that purports to be constant time while not actually being constant time is wrong, no matter how naive or sophisticated the implementation may be.
In some cases it might be necessary to consider the possibility of invalid memory accesses (and avoid the side-channels when doing so). (The example given in the article works around this issue, but I don't know if there are any situations where this will not help.)
> The CMOVcc instruction runs in time independent of its arguments in all current x86 architecture processors. This includes variants that load from memory. The load is performed before the condition is tested. Future versions of the architecture may introduce new addressing modes that do not exhibit this property.
The list includes CMOV.
However, the instructions from the list are guaranteed to have constant execution time, even on any future CPUs, only if the operating system sets a certain CPU control bit.
So on recent and future Intel/AMD CPUs, one may need to verify that the correct choice has been made between secure execution mode and fastest execution mode.
#pragma GCC optimize ("O0")With "-O0", the generated code normally retains a huge number of useless register loads and stores, which lead to non-deterministic timing due to contention in the use of caches and of the main memory interface. Optimized code may run only inside registers, being thus executed in constant time regardless of what other CPU cores do.
The only good part is that this non-deterministic timing will not normally depend on the data values. The main danger of the non-constant execution time is when this time depends on the values of the processed data, which provides information about those values.
There are cases when disabling optimization may cause data-dependent timing, e.g. if with optimization the compiler would have chosen a conditional move and without optimization it chooses a data-dependent branch.
The only certain way of achieving data-independent timing is to use either assembly language or appropriate compiler intrinsics.