What the heck is AEAD again?(ochagavia.nl)

49 pointsby wofo18 hours ago8 comments

tptacek16 hours ago
Another AD example: Ben Toews, in our Vault replacement secret storage system Pet Semetary, uses the AD on SQLite ciphertexts to bind them to a particular row (and/or a particular key path).
I wrote a local file encryption tool, around the same time Filippo was doing `age`, and used the AD on Chapoly to authenticate the chunk offset into the file. (The only thing interesting my tool did was that it could pull keys from AWS KMS).
So one use for AD is to authenticate headers; another is contextual binding.
If it helps (because 'stavros asked across the thread why bother having AD at all rather than just including it in the ciphertext), authenticated data can include data that doesn't even appear in the message, but rather is derived from the context at the time the message is encrypted and decrypted. A message only meant to be decrypted on a particular host (or whatever), for instance, could include the host in its AD, but never record that in the actual bits of the message.
- some_furry14 hours ago
  This is mostly unrelated to what you wrote, Thomas, but I wanted to add something that HN users might benefit from hearing:
  It's important to use a carefully designed AEAD mode rather than assembling it yourself out of parts. If you try to combine a block cipher mode and message authenticator together, you might screw it up in a really funny way: https://soatok.blog/2021/07/30/canonicalization-attacks-agai...
  Sanketh's talk at Real World Crypto 2024 about Next-Generation AEADs is also worth a watch for anyone that, for whatever weird reason, feels at all motivated to invent a new wheel here: https://www.youtube.com/watch?v=7GBzKytVjH4
peterldowns16 hours ago
If you're interested in doing AEAD with the current best-practice algorithms in golang, you might get inspiration from my work-in-progress symcrypt package. I'm not a cryptographer and you shouldn't trust me when I say it works correctly — but it's basically just a small, correct, wrapper around the chacha20poly1305 code in the golang standard library. It has the slight advantage of using different types for the plaintext and the associated data (here called Owner, because I use it to store API keys owned by specific
If you squint at the example usage in the tests, it's basically the API that the blogpost describes.
https://github.com/peterldowns/symcrypt/blob/main/symcrypt_t...
As an aside, I'm always curious to understand why the encryption people say "never roll your own crypto" but then also ship confusing APIs without clear usage examples. For instance, check out the golang chacha20poly1305 docs:
https://pkg.go.dev/golang.org/x/crypto/chacha20poly1305
- tptacek16 hours ago
  I don't understand what you're finding unclear about the Chapoly docs there. AEAD encryption is a first-class abstraction in the standard Go crypto library; in the same sense that crypto/sha256 functions return a crypto.Hash, chacha20poly1305 returns a crypto.AEAD. AEAD itself includes clear usage examples.
  Your `symcrypt` interface lands in a pretty weird place? AEADs in Go export "Seal" and "Unseal" --- with deliberately different names than crypto/cipher/Block's "Encrypt" and "Decrypt", because they're doing something different. The "Owner" thing in your package is kind of odd too.
  You're exposing an interface over Go's AEAD primitives, but not letting users actually provide authenticated data. I don't much care except the whole point of this post is why that matters.
  - gerdesj15 hours ago
    When it comes to this stuff you generally have to follow advice from someone who understands what is going on.
    For me, personally, I'm going to side with tptacek - he has a track record that I have seen over at least a decade if not two.
    I don't know the other bloke but this is a bit of a worry: "I'm not a cryptographer".
    peterldowns15 hours ago
    Agreed; I posted here with the goal of receiving advice.
  - peterldowns15 hours ago
    What I'm finding unclear is how to use the chapoly primitives in a secure way to accomplish my goal. I want to use AEAD encryption to store API keys, per customer, using a single app secret that my app will read from my secret manager when it starts up. AEAD seems like the right way to do that. What's the right way to do that, with AEAD, in golang? The examples/docs for cipher.AEAD at https://pkg.go.dev/crypto/cipher#AEAD don't mention chapoly, but a security reseracher friend of mine recommended I use it. The docs/examples for the chapoly library have two methods — New and NewX, and only NewX has an example. In that example, no associated data is actually used.
    > Your `symcrypt` interface lands in a pretty weird place? AEADs in Go export "Seal" and "Unseal" --- with deliberately different names than crypto/cipher/Block's "Encrypt" and "Decrypt", because they're doing something different.
    What should I use? I'd be extremely happy to do the Right Thing. I linked symcrypt and posted here because I am hoping someone can point me to it.
    > You're exposing an interface over Go's AEAD primitives, but not letting users actually provide authenticated data.
    I really don't understand what you mean by "not letting users actually provide authenticated data". Here, in this test, I show how if you encrypt some secret for one user (the associated data is the Owner), you can only decrypt it if you provide the same associated data (the same Owner). https://github.com/peterldowns/symcrypt/blob/c220f7767fa6c1a...
    tptacek15 hours ago
    I feel like you haven't really gotten your head around what Authenticated Data is. That's OK! Look at 'stavros upthread --- lots of clueful people have trouble with this concept.
    What you should do is just take the examples from cipher#AEAD, but where they do:
    block, err := aes.NewCipher(key) if err != nil { panic(err.Error()) } aesgcm, err := cipher.NewGCM(block) if err != nil { panic(err.Error()) }
    Instead you just do
    chapoly, err := chacha20poly1305.NewX(key) if err != nil { panic(err.Error()) }
    The rest of the code is the same, except that where they write "Never use more than 2^32 random nonces with a given key because of the risk of a repeat", you can ignore that and use a long nonce (like in the example for chacha20poly1305.NewX).
    Your "Owner" looks like what cryptographers would call a "domain separation constant". Domain separation is good! It's another application of authenticated data, too. But not the only one.
    The Go standard library's AEAD "Seal" and "Unseal" is a better interface than what you've got now.
    peterldowns14 hours ago
    Thank you!
- hxtk14 hours ago
  A lot of the hardest problems in practical cryptography come down less to the abstractions around literally encrypting and decrypting things and more around the secure management of key material to ensure that the application supports things like online key rotation and makes it easy to verify that keys are being generated, serialized, and stored securely, and addressing the "First Secret" problem. If you're wanting to learn to use cryptography by developing an abstraction over stdlib cryptographic APIs, I would encourage you to find solutions to those problems.
  Another source of inspiration (and something I use in production) is the Tink family of cryptographic libraries by Google [1]. Their Go implementation [2] is not without its warts, but it's very difficult to run into any of those security bugs that exist around cryptography. Where the Go documentation lacks, there are some examples in the developer docs that help fill some of the gaps [3] [4].
  The documentation isn't 100% complete, but I find it more discoverable than the standard library because while the standard library requires you to read both `crypto/cipher` and `crypto/aes` or `golang.org/x/crypto/chacha20poly1305` depending on what kind of cipher you want, Tink organizes it by use cases [5] and generally groups together all the things you need to do cryptographic operations under the use-case-named interfaces in the `tink` package [6], with the corresponding key generation templates located under the top-level packages of the same name [7].
  [1]: https://developers.google.com/tink
  [2]: https://github.com/tink-crypto/tink-go/
  [3]: https://developers.google.com/tink/key-management-overview#g...
  [4]: https://developers.google.com/tink/encrypt-data#go
  [5]: https://developers.google.com/tink/choose-primitive
  [6]: https://pkg.go.dev/github.com/tink-crypto/tink-go/v2/tink#AE...
  [7]: https://pkg.go.dev/github.com/tink-crypto/tink-go/v2@v2.4.0/...
  - zorgmonkey11 hours ago
    it is worth pointing out that tink has binding for a bunch languages (C++, objective-c, rust, python, go and java) and has support for a bunch key management systems (GCP, AWS and Hashicorp)
stavros16 hours ago
Can someone explain what use the AD is, if we have to decrypt the message to authenticate the AD? If I'm decrypting the message already just to authenticate it, why wouldn't I encrypt the AD as well?
- tptacek16 hours ago
  Because you need it outside the context of encryption/decryption.
  https://news.ycombinator.com/item?id=43827342
  Honestly, the classic "message routing" example most things give for AEAD is not very useful. Context binding is a much better primer for intuition.
  - stavros15 hours ago
    Hm, I understand the use cases, but I don't understand this: The only way to get the AD is to decrypt the ciphertext, right? Otherwise the data is unauthenticated, so I assume it's a big no-no to access it. If you need to decrypt the ciphertext to access the AD, why do you care if it was encrypted or not?
    Basically, I'm not sure why `encrypt(key, nonce, (data, associated data))` (ie adding the AD to your ciphertext, with the encryption framework being unaware of it) is that different from `encrypt(key, nonce, data, associated data)` (ie the AD being a first-class citizen).
    EDIT: I saw your other message, and this makes it click for me:
    > authenticated data can include data that doesn't even appear in the message, but rather is derived from the context at the time the message is encrypted and decrypted
    So the AD can be an additional envelope-level thing at encryption/decryption time, that helps a lot, thanks!
    tptacek15 hours ago
    This is why message routing headers are kind of a fucky example (you can make it make sense but it begs for this confusion).
    Instead, just take the chunked large-file encryption use case I gave in that comment. The chunk offset isn't recorded anywhere in the ciphertext. It's derived contextually while you decrypt the file. The AD ensures that the decryption will explode if you try to cut and paste chunks of the file into different positions.
    stavros15 hours ago
    Yeah, you're right, I was thinking about it in a case where the implementation had the ciphertext being `(block data, chunk offset)` so it did make it part of the message, but it's more elegant for the associated data to be separate from the ciphertext.
    firesteelrain13 hours ago
    Message headers have a tendency to want to mutate so it makes the problem more complicated to solve for but decrypting chunks in the right order is a good example to grasp because it’s referring to essential metadata that needs to stay open so readers of the data know what to do with it. AD is bound to the cipher text.
    tptacek15 hours ago
    It's a really good question, because, in order to verify the AD, you have to have the same key you need to decrypt it.
    stavros15 hours ago
    Yep, that's the part that throws me. Is it fair to say that it's a more elegant way to include metadata in the ciphertext, without really messing with the plaintext itself? Ie it's basically "just" a way to distinguish the message from its metadata?
    edoceo15 hours ago
    Does that make it like some kind of HMAC?
    hxtk13 hours ago
    Yes, in fact, one construction of the AEAD primitive is to use AES-CTR with HMAC to "bolt on" authentication after the fact (AES-CTR on its own is an unauthenticated stream cipher).
    You can find an implementation of AES-CTR-HMAC (at a high level where AES-CTR and HMAC are both given) here: https://github.com/tink-crypto/tink-go/blob/main/aead/aesctr...
    andrewflnr10 hours ago
    > The AD ensures that the decryption will explode if you try to cut and paste chunks of the file into different positions.
    Ah, that's the key bit of perspective. Just talking about "context" is so abstract. That's a case where you don't even need to transmit the AD, right? Do you ever have cases where the AD is a mix of transmitted and locally/"contextually" derived data?
    vlovich12313 hours ago
    The strategy I used instead was to HKDF derive different keys for each chunk using the offset as part of the info to derive the key. No AD needed.
    tatersolid5 hours ago
    Two calls to SHA-256 for each block would be very slow compared with a modern AEAD.
    vlovich123an hour ago
    Hmm… it seemed to run at line speed on our machines. I’m also not sure where you’re getting two calls for sha256 from? Like 1 to derive the key (which is sha256 on a very small amount of data) and the second is?
- rendaw12 hours ago
  IIUC you don't have to decrypt the message - the outermost primitive is the authentication. The point I got is that both the encrypted data and unencrypted data are authenticated.
senderista15 hours ago
I don't understand the example. Presumably the server doesn't have the user-owned encryption key. So how can the server "detect that the user id has been tampered with" if it doesn't have the key necessary to authenticate the user id?
- cakoose14 hours ago
  Yup, the example doesn't make sense for the reason you pointed out.
  You could water down the example a bit to make it work:
  1. Assume there's some other authentication mechanism for client-server communication, e.g. TLS.
  2. The client sends the user ID unencrypted (within TLS) so the server can route, but encrypts the message contents so the server can't read it.
  3. The final recipient can validate the message and the user ID.
  This saves the client from having to send the user ID twice, once in the ciphertext and once in the clear.
  But another more interesting use case is when you don't even send the associated data: https://news.ycombinator.com/item?id=43827342
- hxtk14 hours ago
  Suppose you have a server with its own encryption key, and a key-value database full of encrypted data (secured by this root keyset) associated with a user.
  Even if I gain access to the database, if the keys are managed securely, I can't read another user's data (or even really my own). I have to go through the authorization logic of the application that will decrypt it on my behalf.
  However, if I can create a row in the database with my ID and another user's data, I can then convince the server I am authorized to view that cell, and it will happily decrypt it on my behalf, assuming something like AES-CTR or some other stream cipher without authentication.
  Authenticated encryption like AES-CTR-HMAC solves that problem, because now the application will see that I am authorized to view that cell (because it sees the user ID matches mine) and it will decrypt it for me (using that user ID as the associated data), but the decryption will fail because the associated data does not match, leaving me unable to exfiltrate the data that I convinced the server belonged to me, and probably setting off some kind of alarm because that sort of decryption should never fail unless things have been tampered with.
  I'm not overly fond of the example and I find it confusing as well. I think the example may be a bit confusing because the term "authentication" is overloaded between application-level authentication and cryptographic authentication, i.e., "if the chat protocol authenticates the user ID" sounds like it is talking about the user logging into the server securely. The user is authenticated by having a secret negotiated with the server. In the next bullet, they talk about "authenticating" the associated data, referring to it in the cryptographic context, but they don't indicate why that would be a problem because in their example, the malicious actor still doesn't have the key. The article handwaves it as "the attacker might be creative."
  If they had the key, but not the associated data, you'd still be in a relatively bad situation, because the associated data is not secret. It doesn't serve as a second key because it is not high enough entropy and is ideally zero entropy conditional on already having all information from the originating context.
  - vlovich12313 hours ago
    But why bother putting the user id in the AD instead of part of the authenticated encrypted payload?
    hxtk13 hours ago
    It sounds like you are assuming that if the data were modified, decryption would fail catastrophically and you'd end up with garbage. This is precisely the point of AEAD: providing cryptographic guarantee that decryption will fail catastrophically if things are tampered with.
    That guarantee is not provided with unauthenticated stream ciphers. For example, some stream ciphers work by essentially using a deterministic but unpredictable PRNG seeded with the key and IV to generate a bitstream, and then XOR the plaintext with that bitstream to generate the ciphertext.
    With such a stream cipher, If Eve knows that the data format is, e.g., the an 8-byte unsigned integer user ID followed by the rest of the payload, Eve can take the first 8 bytes of the ciphertext and XOR it with Bob's user ID (public information) and her own user ID to corrupt the message in such a way that the ID in the resulting cleartext would contain her user ID instead of Bob's, and thus pass the validation that it seems like you are proposing.
    Let C[] be the cipher text, K[] be the key stream, B be Bob's ID, and E be Eve's ID:
    C[:8] = B ^ K[:8]
    C[:8] ^ B = K[:8]
    C' = C[:8] ^ B ^ E ++ C[8:]
    C' would decrypt, validate as "belonging" to Eve, and contain Bob's data.
    vlovich12312 hours ago
    Nowhere did I say I used an unauthenticated cipher. It was all authenticated cipher. Indeed, usually it was still AEAD (AES-GCM), but instead of using the offset as the AD I simply derived a new key from the offset, thus not using the AD part. This way I would swap out the algorithm to an authenticated cipher that wasn't AEAD (e.g. AES-CBC+HMAC) without breaking how anything worked.
    hxtk12 hours ago
    It sounds like the source of confusion is in terminology. I would consider AES-CBC+HMAC to be an AEAD construction, and I would consider whatever "secret key" you pass into HMAC along with the ciphertext to be the "associated data". AES-GCM is an AEAD construction that gives you the MAC organically as part of the cipher, but that is not what makes it AEAD as I understand it.
    If you are using an AEAD cipher mode, then you always have AD, but sometimes that AD might be the empty string. In that case, the advantage to using contextual AD as opposed to using the empty string as AD and then doing additional verification on the decrypted object is that it prevents some kinds of timing attacks, because cryptographic libraries will often implement AEAD constructions to fail in constant time, where as your scheme will take longer if post-decryption validation of contextual data encoded in the plaintext fails compared to if decryption fails.
    vlovich12311 hours ago
    > I would consider AES-CBC+HMAC to be an AEAD construction
    OK fair.
    > and I would consider whatever "secret key" you pass into HMAC along with the ciphertext to be the "associated data"
    The secret key isn't associated data. You take your base HKDF key and expand new crypto for an authenticated cipher from the offset as info (+ maybe other parameters like file name). That key is then used to decrypt. If you squint I guess you could call that AD but it's functionally a very different role.
    > because cryptographic libraries will often implement AEAD constructions to fail in constant time, where as your scheme will take longer if post-decryption validation of contextual data encoded in the plaintext fails compared to if decryption fails.
    I think you've misunderstood what I said. As I repeat above, the AEAD key is derived from the offset. There's no post-decryption validation of contextual data because the plaintext is empty. HKDF derivation is constant time and authenticated decryption is constant time. Once decrypted you have a valid block at that location. There's nothing extra left to validate (or perhaps the decrypted contents, but that's irrelevant for cryptographic purposes).
    My broader point is that I have yet to encounter a use-case for a non-empty AD string.
    tptacek11 hours ago
    There are several of them on this thread. For example: encrypt a 10 gigabyte file. You'll need to chunk it; each chunk will exist at an offset. Encode the chunk offset into the Associated Data. Notice that you never store this Associated Data; you simply have it in the context of encrypting and decrypting the file. But the AEAD MAC captures it, and now you can't cut and paste chunks of a ciphertext.
    vlovich12311 hours ago
    Or alternatively, as I said, derive a new key for each chunk where the offset is part of the info used to derive the key and have an empty AD. Same effect.
    hxtk10 hours ago
    It sounds like you have constructed a way to encrypt data such that it can only be decrypted by someone who has the same (secret) key and (non-secret) associated data that was used to encrypt it, or else decryption fails.
    As you said, same effect: the scheme you have described is not an alternative to AEAD. It is an example of AEAD. You're still using the offset as associated data, you just happen to be composing your AEAD scheme out of another AEAD scheme into which you pass an empty string as associated data.
    Other than differences in the limit to the number of messages you can encrypt before nonce exhaustion or the number of bits of secrecy or authentication strength provided, the external interface and use case of your system perfectly matches that of AES-GCM or other popular AEAD constructions.
    vlovich123an hour ago
    > Other than differences in the limit to the number of messages you can encrypt before nonce exhaustion or the number of bits of secrecy or authentication strength provided, the external interface and use case of your system perfectly matches that of AES-GCM or other popular AEAD constructions.
    Ahh, but that’s not a trivial part of the design and why this is strictly better than using a single AES-GCM key with AD. And also it’s more generic across whatever type of key you choose to derive.
    tptacek11 hours ago
    I may be missing a subtlety here but it seems like you've essentially reinvented Associated Data but with an extra KDF extraction and an AES key expansion for every chunk.
- 14 hours ago
  undefined
- wofo14 hours ago
  It looks like I actually got the example wrong, sorry about that!
  Somehow I assumed that the server was able to authenticate the receiver id, but as you correctly point out, that would require knowing the encryption key. I'll have to think about a fix for the example.
  - hxtk13 hours ago
    A usual example I use (because it reflects how I tend to use AEAD in applications) is to assume the server (and only the server) has the keys for something like data-at-rest encryption. Application level logic decides whether the server is going to decrypt some data on behalf of the user, and the authenticated data prevents tampering.
    If Alice saves some data to her account, but Eve manages to access the database, Eve can change the database state to convince the application to retrieve Alice's data for her (by cloning it into a row with her own user ID). However, when the application attempts to decrypt that data, it will fail because of the AEAD. This ensures that both the database and some service with access to the encryption key (or the encryption key itself) would have to be compromised in order for Eve to exfiltrate her illicit copy of Alice's data.
    wofo12 hours ago
    Thanks for the example! It has helped me understand better the use case of AEAD for at-rest-encrypted-data.
    I finally updated the example to a new one, though it's still message-based (it fits the rest of the article better). If I had come across your example earlier, I might have stayed away from a message-based formulation of the problem at all... Better luck next time I guess :)
twic16 hours ago
Internally, is AEAD just using the "usual" ciphers, digests, and PRNGs, just making sure to combine them in the right way? If so, are all AEAD "ciphers" the same, just with different sub-primitives plugged in?
- tptacek16 hours ago
  Not generally. An AEAD composed the way you're describing, out of (say) non-authenticated CTR mode and an HMAC MAC, would be described as "a generic composition". The more common AEADs, at least the way we think about them, aren't compositions of otherwise user-serviceable components. I'm not sure there's a name for them; they're the norm, so we describe those integrated, hermetically-sealed constructions (like GCM) as "AEAD".
- coppsilgold12 hours ago
  An AEAD can be constructed from pieces made and studied for other purposes (eg. block ciphers and hash functions). There is also a cryptographic primitive which can be used for AEAD almost without modification: the cryptographic sponge. But even so this particular primitive is often tailored for the security requirements of AEAD to be more performant: https://ascon.isec.tugraz.at/specification.html
  An AEAD can also be made de novo. Such as AEGIS[1], which performs encryption and authentication in one pass (much like the sponges, but much more performant).
  [1] <https://competitions.cr.yp.to/round3/aegisv11.pdf>
- syncsynchalt12 hours ago
  Even if you combine the operations so that it works, it may not be obvious whether you've opened yourself to side channels like timing attacks.
  A naive implementation of the AEAD feature list could trivially allow you to guess the AD for a ciphertext if the AD validation is checked too early in the process.
kazinator13 hours ago
The TL;DR of this seems to be: the plaintext metadata accompanying ciphertext ("associated data") is mixed into the ciphertext's encryption (essentially as an initial vector). Thereby, if the plain-text data is altered, the ciphertext cannot be correctly decrypted. The ciphertext is both a secret message, and a signature of the unencrypted data, so a separate HMAC is not required.
We can imagine, e.g. in the context of e-mail, if the DKIM header signature were combined a PGP-encrypted body as one operation. I'm ducking under the table now, though.
- tptacek11 hours ago
  The core idea, one that PGP does not "get" (except in newer, non-compatible implementations) is simply that of ciphertext authentication. Once you have authentication, the Associated Data construction is pretty easy to get; put differently, almost immediately after we "had" widespread authenticated encryption, we had AEAD. PGP finds these problems difficult to solve (and so simply doesn't solve them) because it confuses error-detecting integrity checks, signatures, encryption, and message authentication, which are 4 different things. But PGP also predates our modern understanding of the differences between those 4 things.
andrekandre14 hours ago
i really appreciate how this article was written
just the right length and pacing to get me to the end and the point across
- wofo13 hours ago
  Thanks for the kind words! I'm trying to balance pragmatism with depth. Glad it was useful to you ;)
halosghost16 hours ago
See also: https://www.latacora.com/blog/2024/07/29/crypto-right-answer...
All the best,
-HG