feat(encryption) [5/N] Support encryption: Encryption Manager#2383
feat(encryption) [5/N] Support encryption: Encryption Manager#2383xanderbailey wants to merge 16 commits intoapache:mainfrom
Conversation
b7fe819 to
74f9e2f
Compare
| default = moka::future::Cache::builder().time_to_live(DEFAULT_CACHE_TTL).build(), | ||
| setter(skip) | ||
| )] | ||
| kek_cache: moka::future::Cache<String, SensitiveBytes>, |
There was a problem hiding this comment.
Put a cache here for the same reason Java does, KMS is not aware of the key_id and since we populate the cache when we create_kek it would be hard to do this purely within a CachingKmsService
|
Thanks @xanderbailey! Here's my summary of reviewing with Claude:
|
|
We don't have |
Do we think we actually need a comment here? The code seems self documenting here? |
I think UUID is fine here maybe? both are 16 bytes and random |
Agreed. |
Agreed. |
Yeah, seems fine. |
blackmwk
left a comment
There was a problem hiding this comment.
Thanks @xanderbailey for this pr!
| default = moka::future::Cache::builder().time_to_live(DEFAULT_CACHE_TTL).build(), | ||
| setter(skip) | ||
| )] | ||
| kek_cache: moka::future::Cache<String, SensitiveBytes>, |
There was a problem hiding this comment.
Please don't use fully qualified name here, use them in imports as much as possible.
| /// Transparently decrypts on read. | ||
| pub struct EncryptedInputFile { | ||
| inner: InputFile, | ||
| decryptor: Arc<AesGcmFileDecryptor>, |
There was a problem hiding this comment.
I think we should just store StandardKeyMetadata here?
There was a problem hiding this comment.
And just use that to construct the the FileRead when we need it?
There was a problem hiding this comment.
The AesGcmFileDecryptor is the thing we actually need at read time so is it a good idea to just construct it once?
There was a problem hiding this comment.
AesGcmFileDecryptor actually didn't store any state? What actually matters is AesGcmFileRead, right? I'm not a big fan of adding too much abstraction. Why not just make things as following:
pub struct EncryptedInputFile {
key_metadata: StandardKeyMetadata,
inner: InputFile
}
// Similar to EncryptedOutputFIle
| } | ||
|
|
||
| /// Returns the key metadata bytes (for storage in manifest/data files). | ||
| pub fn key_metadata(&self) -> &[u8] { |
There was a problem hiding this comment.
We should return StandardKeyMetadata
| /// Returns a [`StandardKeyMetadata`] containing a fresh DEK and AAD prefix. | ||
| /// The caller should pass this to the Parquet writer to configure | ||
| /// `FileEncryptionProperties`, and serialize it for storage in the manifest. | ||
| pub fn generate_native_key_metadata(&self) -> Result<StandardKeyMetadata> { |
There was a problem hiding this comment.
Let's make it mod private first/
|
|
||
| /// Decrypt an encrypted input file, returning an [`EncryptedInputFile`] | ||
| /// that transparently decrypts on read. | ||
| pub fn decrypt(&self, input: InputFile, key_metadata: &[u8]) -> Result<EncryptedInputFile> { |
There was a problem hiding this comment.
| pub fn decrypt(&self, input: InputFile, key_metadata: &[u8]) -> Result<EncryptedInputFile> { | |
| pub fn decrypt(&self, input: EncryptedInputeFile) -> Result<EncryptedInputFile> { |
There was a problem hiding this comment.
This seems cyclical to me no?
There was a problem hiding this comment.
Yes, we got in trouble here. After second thought, I think we should remove this method. Java's implementation in fact has nothing to do with EncryptionManager. WDYT?
There was a problem hiding this comment.
I think you're right yes, this was a good catch, we can remove this altogether. We can now create EncryptedInputFile from an InputFile and StandardKeyMetadata
| /// write) or the existing KEK expired (rotation). When `Some`, the | ||
| /// caller must also persist this KEK in table metadata so that future | ||
| /// `unwrap_key_metadata` calls can find it. | ||
| pub async fn wrap_key_metadata( |
There was a problem hiding this comment.
It's deprecated in java, remove it.
There was a problem hiding this comment.
I think wrapKey and unwrapKey are deprecated. wrap_key_metadata maps to EncryptionUtil.encryptManifestListKeyMetadata() and unwrap_key_metadata maps to EncryptionUtil.decryptManifestListKeyMetadata() but I've tried to avoid the EncryptionUtil grab-bag in my implementation.
There was a problem hiding this comment.
Thanks for explaination. Seems it's not used for now? I prefer to remove it if it's not used and add it back later to make the pr small.
There was a problem hiding this comment.
Maybe we can keep these, EncryptionManager now just has three public methods:
encrypt
wrap_key_metadata
unwrap_key_metadata
There was a problem hiding this comment.
The problem is it will take more time for me to understand how java's equivalent methods work. I think the io part is already ready for merging. I'm fine if you insist on putting them together, but it will take longer for the pr to be merged.
There was a problem hiding this comment.
Yep, that makes sense to me, happy to remove the EM from this PR and merge the IO
There was a problem hiding this comment.
Also I don't think you should use general name like wrap_key_metadata/unwrap_key_metadata. The logic here are manifest list specific, e.g. it requires looks up EncryptedKeys in table metadata, which others could be expressed using EncryptedInputFile.
| /// Given an `EncryptedKey` entry (from a manifest list or snapshot) and | ||
| /// the full map of encryption keys from `TableMetadata`, returns the | ||
| /// unwrapped key metadata bytes (e.g. serialized `StandardKeyMetadata`). | ||
| pub async fn unwrap_key_metadata( |
5311801 to
32fdd3a
Compare
|
|
||
| /// Decrypt an encrypted input file, returning an [`EncryptedInputFile`] | ||
| /// that transparently decrypts on read. | ||
| pub fn decrypt(&self, input: InputFile, key_metadata: &[u8]) -> Result<EncryptedInputFile> { |
There was a problem hiding this comment.
Yes, we got in trouble here. After second thought, I think we should remove this method. Java's implementation in fact has nothing to do with EncryptionManager. WDYT?
| /// Returns a [`StandardKeyMetadata`] containing a fresh DEK and AAD prefix. | ||
| /// The caller should pass this to the Parquet writer to configure | ||
| /// `FileEncryptionProperties`, and serialize it for storage in the manifest. | ||
| #[allow(dead_code)] |
There was a problem hiding this comment.
Why we need to add this annotation? If it's not used for now, let's remove it.
There was a problem hiding this comment.
Have removed for now, we can add it back in when we need it 5107145
| /// write) or the existing KEK expired (rotation). When `Some`, the | ||
| /// caller must also persist this KEK in table metadata so that future | ||
| /// `unwrap_key_metadata` calls can find it. | ||
| pub async fn wrap_key_metadata( |
There was a problem hiding this comment.
Thanks for explaination. Seems it's not used for now? I prefer to remove it if it's not used and add it back later to make the pr small.
| /// Transparently decrypts on read. | ||
| pub struct EncryptedInputFile { | ||
| inner: InputFile, | ||
| decryptor: Arc<AesGcmFileDecryptor>, |
There was a problem hiding this comment.
AesGcmFileDecryptor actually didn't store any state? What actually matters is AesGcmFileRead, right? I'm not a big fan of adding too much abstraction. Why not just make things as following:
pub struct EncryptedInputFile {
key_metadata: StandardKeyMetadata,
inner: InputFile
}
// Similar to EncryptedOutputFIle
Yeah okay so this is a good take. If we do this then it becomes very clear that pub fn decrypt(&self, input: InputFile, key_metadata: &[u8]) -> Result<EncryptedInputFile> {
let metadata = StandardKeyMetadata::decode(key_metadata)?;
Ok(EncryptedInputFile::new(input, metadata))
} |
|
Just a thought, worried that if we remove the wrapping and unwrapping of keys, the EM is basically empty (doesn't need kms etc). I could remove the EM from this PR and just have the io, we could merge that and then introduce the EM when we need it in the next PR that does the manifest list encryption decryption? Or review the EM as is with the key wrapping logic? |
Ok, let's review them in the same pr. |
Which issue does this PR close?
Part of #2034
What changes are included in this PR?
Stacked on #2340. Adds
EncryptionManager— handles two-layer envelope encryption (master key → KEK → DEK) for Iceberg tables.New files
manager.rs—EncryptionManagerwithtyped_builderconstruction:encrypt()— wraps anOutputFilein anEncryptedOutputFilewith a freshly generated DEK + AAD prefixwrap_key_metadata()/unwrap_key_metadata()— KEK envelope wrap/unwrap for manifest list key metadataencryption_keys(HashMap)(bulk, for production load fromTableMetadata) andadd_encryption_key(EncryptedKey)(one-at-a-time, ergonomic for tests)io.rs—EncryptedInputFile/EncryptedOutputFilewrappers that hold aStandardKeyMetadataand lazily build the AGS1 stream cipher onreader()/writer()Design decisions
NativeEncryptedInputFile/NativeEncryptedOutputFilewrappers — For PME,StandardKeyMetadataalready carries the DEK and AAD prefix. Wrapper types that just bundle anInputFile/OutputFilewith the same data are unnecessary indirection.decrypt()method on the manager — it had no manager state to use; callers doStandardKeyMetadata::decode(bytes)+EncryptedInputFile::new(input, metadata)directly. Mirrors the fact that Java'sdecrypt()is also a thin factory unrelated to manager state.AesGcmFileDecryptor/AesGcmFileEncryptor— the encrypted file types storeStandardKeyMetadatadirectly and construct the underlyingAesGcmFileRead/AesGcmFileWriteon demand. One fewer layer.DataInvaliderror rather than silently passingNoneAAD (which would weaken the tampering defense).How this differs from Java's
StandardEncryptionManageraddManifestListKeyMetadata()mutates an internal map and callers need to downcast toStandardEncryptionManagerto access the keys. Ourwrap_key_metadata()returns(wrapped_key, Option<new_kek>)directly — no hidden mutation, no downcasting. Java reference.EncryptionUtilgrab-bag: Java needs a static utility class (decryptManifestListKeyMetadata,encryptManifestListKeyMetadata, etc.) because the interface is too narrow. Here those are just methods onEncryptionManager.table_key_idenforced at compile time: required viatyped_builder, can't forget it. Unencrypted tables useOption<EncryptionManager>instead of Java'sPlaintextEncryptionManager.Usage notes (for follow-up PRs that wire this in)
load_manifest_listwill do something like:Are these changes tested?
Yes — 11 manager tests covering KEK creation/rotation/caching, wrap/unwrap roundtrips, AAD tampering and missing-timestamp rejection, plus a full encrypt-then-read roundtrip via
EncryptedInputFile.