feat(encryption): storage envelope wiring (Stage 2)#742
feat(encryption): storage envelope wiring (Stage 2)#742bootjp wants to merge 14 commits intofeat/encryption-sidecarfrom
Conversation
Wire the §4.1 storage envelope into the pebble-backed MVCCStore.
Every value handed to PutAt / ApplyMutations / ExpireAt is wrapped
in an authenticated AES-256-GCM envelope when a DEK is active for
the storage purpose; reads on the way out unwrap. Cleartext
behaviour stays the default — the cipher is opt-in via
WithEncryption, so every existing NewPebbleStore caller is
untouched.
Value header layout. The 9-byte fixed header now carries an
encryption_state field per design §7.1:
byte 0: bit 0 tombstone | bits 1-2 encryption_state | bits 3-7 reserved
bytes 1-8: ExpireAt (uint64 LE)
bytes 9..: body, plaintext when encState=0b00, §4.1 envelope
bytes when encState=0b01
Reserved encryption_state values (0b10, 0b11) are rejected with
the new ErrEncryptedValueReservedState sentinel so an old binary
cannot silently treat a future-format encrypted entry as cleartext.
Reserved bits in the header (3-7) are also fail-closed.
Wiring. WithEncryption takes a Cipher, a NonceFactory, and an
ActiveStorageKeyID closure. The closure-based active-key resolver
lets Stage 5/6 swap in a sidecar-backed source without re-touching
the storage layer. The NonceFactory interface is the seam Stage 7
will replace with the writer-registry-backed implementation;
todays CounterNonceFactory test helper produces the same 12-byte
node_id-local_epoch-write_count shape and satisfies the contract
for tests.
AAD record-binding. AAD is HeaderAADBytes(version, flag, key_id)
concat encoded_pebble_key, so a disk attacker who copies a valid
ciphertext into a different (key, ts) slot fails GCM verification.
The §4.1 case-2/3 cut-and-paste / version-substitution attacks
are all handled by AES-GCM authentication, never silently zeroed.
ErrEncryptedReadIntegrity surfaces tag mismatches as a typed read
error.
Snapshot path. Native Pebble snapshots (snapshot_pebble.go) ship
raw bytes and are encryption-transparent. The MVCC-deconstructed
snapshot format does not carry encryption_state in its v2 wire
(setEncodedVersionInBatch writes encStateCleartext, with a comment
pointing at Stage 8 which bumps the format to v3 to round-trip
encrypted entries). This intentionally limits Stage 2 to the
on-line Put/Get path; ranged restore from an encrypted dataset is
covered when Stage 8 lands.
Tests. Round-trip across PutAt/GetAt/ApplyMutations, tombstone
independence, the §7.1 cipher-wired-but-DEK-inactive window,
AAD record-binding (cut-and-paste from key A's slot into key B's
fails as ErrEncryptedReadIntegrity), 1-byte tag flip, reserved
encState rejection, header-encoding pin, and a rapid property
covering many random (key, value, ts) draws. The existing store
suite stays green on the cipher=nil legacy path.
|
Important Review skippedAuto reviews are disabled on base/target branches other than the default branch. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Code Review
This pull request implements transparent value-level encryption for the Pebble-backed store, introducing an authenticated envelope that binds ciphertext to storage keys via AAD. It adds the necessary glue logic for encryption/decryption, a nonce factory for unique AES-GCM nonces, and updates the value encoding format to support encryption state flags. Review feedback suggests optimizing the hot paths by pre-allocating AAD buffers in both the encryption and decryption functions to reduce memory reallocations.
I am having trouble creating individual review comments. Click here to see my feedback.
store/encryption_glue.go (89-91)
The AAD buffer is allocated twice here: once inside AppendHeaderAADBytes(nil, ...) and again when appending pebbleKey. Pre-allocating the buffer with the correct capacity avoids unnecessary reallocation on the hot path. Ensure that the total size is bounded to prevent potential OOM issues from external inputs.
aad := make([]byte, 0, encryption.HeaderAADSize+len(pebbleKey))
aad = encryption.AppendHeaderAADBytes(aad, encryption.EnvelopeVersionV1, envelopeFlag, keyID)
aad = append(aad, pebbleKey...)
References
- Pre-allocating a buffer is acceptable if a strictly enforced maximum size limit is checked before allocation.
- Apply a fixed bound to collections that can grow from external requests to prevent unbounded memory growth and OOM issues.
store/encryption_glue.go (131-132)
Similar to the encryption path, the AAD buffer here is allocated twice. Pre-allocating with the known capacity improves performance on the read path. Ensure the allocation size is bounded to avoid memory exhaustion.
aad := make([]byte, 0, encryption.HeaderAADSize+len(pebbleKey))
aad = encryption.AppendHeaderAADBytes(aad, env.Version, env.Flag, env.KeyID)
aad = append(aad, pebbleKey...)
References
- Pre-allocating a buffer is acceptable if a strictly enforced maximum size limit is checked before allocation.
- Apply a fixed bound to collections that can grow from external requests to prevent unbounded memory growth and OOM issues.
CI gci default sections expect bootjp imports mixed with externals in the second group; the original commit accidentally landed the 3-group format (stdlib / external / bootjp) which fails the golangci-lint v2.9.0 gci check used in CI but passed locally because of a stale --fix cache. gci diff confirms the 2-group format is the canonical layout for this config.
💡 Codex Reviewelastickv/store/encryption_glue.go Lines 89 to 92 in 6309c4e The AAD only covers the envelope header and Pebble key, but not the outer value header ( ℹ️ About Codex in GitHubYour team has set up Codex to review pull requests in this repo. Reviews are triggered when you
If Codex has suggestions, it will comment; otherwise it will react with 👍. Codex can also answer questions or update the PR. Try commenting "@codex address that feedback". |
…emini medium) AppendHeaderAADBytes(nil, ...) allocates HeaderAADSize bytes; the following append(aad, pebbleKey...) then re-allocates whenever the returned slice's capacity is below header + key. Two allocations per Put, two per Get. Pre-size to HeaderAADSize+len(pebbleKey) so neither append reallocates. Pebble key length is bounded by maxPebbleEncodedKeySize so the capacity is finite. Symmetric on encrypt/decrypt because the AAD must be byte-identical between the two paths or GCM verification fails.
|
@claude review PR742 round-2: gemini medium x2 を反映しました (commit 7ac2445)。
|
|
Claude finished @bootjp's task in 5m 58s —— View job PR742 Round-2 Review
AAD pre-sizing fix (7ac2445) — Correct ✅Both Gemini's two medium findings are fully addressed. Codex P1 — Value-header bytes not in AADThe Codex concern is real, but it is a deliberate design trade-off that needs to be surfaced explicitly. What the design doc says. §2.2 of
The outer header byte (tombstone + encState bits) and ExpireAt are cleartext by design so The specific gap Codex names. Flipping Is it a blocker? Not by itself — the design doc acceptance of the cleartext header covers this flip path too. But there are two paths forward worth choosing between: Option A (preferred): Bind the flags byte into the AAD. The GC operations only need tombstone and expireAt — they do NOT need encState to be cleartext. Including byte 0 of the value header in the AAD closes the encState-flip silent-corruption without breaking the GC-without-decrypt invariant: // encrypt side — encState is always encStateEncrypted here (we only
// call encryptForKey when the cipher is active), tombstone is always
// 0 for live writes:
const encryptedLiveFlags byte = encStateEncrypted << encStateShift // 0x02
aad = append(aad, encryptedLiveFlags)
// decrypt side — read the flags byte we stored:
aad = append(aad, flags) // flags = data[0] as parsed by decodeValueIf an attacker flips the flags byte after writing, the AAD at decrypt time ( Option B (current + doc): Document the behavior. If including the flags byte in AAD is deferred (format change, design-doc amendment needed), add a // TestEncryption_EncStateFlagFlip documents the §2.2 design
// trade-off: the flags byte is not in the AAD, so flipping encState
// 0b01→0b00 makes the read return raw envelope bytes rather than
// ErrEncryptedReadIntegrity. This is accepted per §2.2.Option A is worth a one-line design-doc amendment and a follow-up commit. Option B unblocks merge now. Other observations
MVCC snapshot restore — Correct.
Prop test coverage — Good. Reserved encState detection — Correct. The dual-check in Summary
|
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 7ac2445e46
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| if sv.Tombstone { | ||
| return nil, ErrKeyNotFound | ||
| } | ||
| if sv.ExpireAt != 0 && sv.ExpireAt <= ts { | ||
| return nil, ErrKeyNotFound |
There was a problem hiding this comment.
Authenticate header bits before serving not-found for encrypted rows
This path returns ErrKeyNotFound based on Tombstone/ExpireAt before any AEAD verification, but those fields come from the outer value header and are not covered by decryptForKey's AAD/body check. A disk attacker can flip bit 0 or lower ExpireAt on an encrypted record and force silent not-found/expired reads without triggering ErrEncryptedReadIntegrity, which defeats the storage-envelope tamper-detection goal for at-rest attacks.
Useful? React with 👍 / 👎.
The original AAD covered only (envelope_version, flag, key_id) and
the encoded Pebble key. The on-disk value-header (tombstone bit,
encryption_state, expireAt) was not authenticated, so a disk
attacker could:
- flip the tombstone bit on an encrypted record so getAt /
processFoundValue returns ErrKeyNotFound BEFORE any AEAD
verification runs, or
- lower expireAt below the read timestamp so the same paths
return ErrKeyNotFound via the expired-skip branch.
In both cases the encryption-integrity error never surfaces and
the storage-envelope tamper-detection goal is silently defeated
for at-rest attacks.
Fix:
- Add writeValueHeaderBytes helper that emits the 9-byte header
(flags + expireAt) without a body slice.
- encryptForKey now takes expireAt and includes the on-the-wire
header bytes in the AAD; tombstone is hard-coded false because
encrypt is never invoked on tombstone writes (deletes carry no
plaintext and stay cleartext).
- decryptForKey now takes the storedValue, reproduces the same
9-byte header from sv.Tombstone / sv.ExpireAt / sv.EncState,
and includes it in the AAD. A flip on disk no longer matches
the encrypt-time AAD, so GCM fails and
ErrEncryptedReadIntegrity surfaces.
- readVisibleVersion / processFoundValue now decrypt FIRST and
run the tombstone / expireAt visibility checks AFTER. Pre-
decrypt header values are not yet authenticated, so the old
early-return paths were the bypass.
- buildStorageAAD centralises the AAD layout
(header_aad + value_header + pebble_key) and pre-allocates the
buffer so neither append re-allocates (preserves PR742 round-1
gemini perf fix).
Caller audit (semantic change: pre-tombstone-check now decrypts):
- encryptForKey callers: PutAt, ExpireAt, applyMutationsBatch
OpTypePut. All three now thread expireAt to the helper. Tombstone
writes (DeleteAt, applyMutationsBatch OpTypeDelete) bypass the
encrypt path entirely.
- decryptForKey callers: readVisibleVersion (getAt path),
processFoundValue (scan path). Both reorder visibility checks
to run after decrypt. Cleartext entries (encState=0b00)
short-circuit decryptForKey at the first branch and keep the
legacy unauthenticated tombstone/expireAt — the §7.1 rollout
closes this for fully-encrypted clusters.
New regression tests cover three on-disk flips that the original
AAD permitted: tombstone-bit set, expireAt lowered to past,
expireAt advanced. All surface ErrEncryptedReadIntegrity.
|
@claude review PR742 round-3: codex P1 攻撃面: 元 AAD は (envelope_version, flag, key_id, pebble_key) のみで、値ヘッダ (tombstone bit, encryption_state, expireAt) は未認証。ディスク攻撃者が tombstone bit を flip すれば silent ErrKeyNotFound、expireAt を lower すれば silent expired-skip を引き起こせ、ErrEncryptedReadIntegrity を経由しない bypass。 修正:
caller audit (意味論変更: pre-tombstone-check で decrypt):
新規テスト ( |
|
Claude finished @bootjp's task in 2m 57s —— View job PR742 Round-3 Review
Round-3 changes — what was fixedThe core approach is correct and the two main read paths are now properly protected.
P2:
|
| Finding | Status |
|---|---|
buildStorageAAD / writeValueHeaderBytes correctness |
✅ Correct |
encryptForKey header binding (tombstone=false, expireAt, encState) |
✅ Correct |
decryptForKey AAD reproduction from on-disk fields |
✅ Correct |
readVisibleVersion decrypt-before-visibility |
✅ Fixed in round-3 |
processFoundValue decrypt-before-visibility |
✅ Fixed in round-3 |
TestEncryption_ValueHeaderTamperRejected (GetAt path) |
✅ Correct |
isVisibleLiveKey in scanDeletePrefix — unauthenticated header |
|
encStateCleartext flip silent-garbage return |
The read path is now fully protected. isVisibleLiveKey is the remaining gap worth a follow-up commit before merge.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 624016d52e
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| if sv.EncState == encStateCleartext { | ||
| return body, nil |
There was a problem hiding this comment.
Reject encrypted rows relabeled as cleartext
When decryptForKey sees encStateCleartext, it returns the body without any envelope/tag verification. A disk attacker (or bit-flip corruption) can change an encrypted row’s header bits from 0b01 to 0b00, causing reads to return raw envelope bytes as if they were legitimate plaintext instead of surfacing ErrEncryptedReadIntegrity. This is a fail-open integrity bypass for encrypted-at-rest data because the authentication check is skipped entirely once the header is relabeled.
Useful? React with 👍 / 👎.
The cleartext branch of decryptForKey returned the body verbatim
without any envelope verification. A disk attacker who flipped
encryption_state from 0b01 to 0b00 left the original envelope bytes
in place and told the read path to skip decryption — silently
surfacing raw envelope bytes to the caller as if they were
legitimate plaintext. This is a fail-open integrity bypass for
encrypted-at-rest data because the AEAD tag check is skipped
entirely once the header is relabeled.
Fix:
- Add Cipher.HasKey accessor in the encryption package, returning
false on nil/zero-value receivers (mirrors the existing
zero-value contract) and delegating to Keystore.Has otherwise.
- On the cleartext branch in decryptForKey, run the body through
DecodeEnvelope. If it parses as a well-formed envelope AND the
embedded key_id is loaded in the live keystore (i.e. the bytes
are almost certainly a relabeled real envelope rather than a
coincidental shape match), surface ErrEncryptedReadIntegrity
instead of returning the bytes.
- Bodies that don't parse, or whose parsed key_id is unknown,
keep the legitimate-cleartext fast path so a Stage-2-style
cutover (legacy cleartext entries + new encrypted entries
coexisting) stays readable.
False-positive bound: a benign cleartext payload would have to
start with EnvelopeVersionV1 (0x01), be at least EnvelopeOverhead
(34) bytes, parse cleanly through DecodeEnvelope, AND embed a 32-bit
key_id collision with a loaded DEK. Joint probability is small
enough to treat the heuristic as production-safe; Stage 8's
authenticated MVCC metadata bit will replace it with a deterministic
authentication.
Caller audit: rejectRebadgedEnvelope is called only by
decryptForKey on the cleartext branch. The cipher=nil short-circuit
preserves the legacy single-mode deployment path. The existing
TestEncryption_InactiveKeyWritesCleartext test exercises legitimate
cleartext+encrypted coexistence and continues to pass — its
cleartext payloads don't parse as envelopes.
New regression test TestEncryption_RebadgeAttackRejected uses a
distinguishable writeTS (314159 vs the other tamper tests' 100) to
also satisfy the unparam linter on tamperPebbleValue's ts arg.
New foundation API: TestCipher_HasKey covers the loaded id, an
unknown id, and the nil-receiver / zero-value paths.
|
@claude review PR742 round-4: codex P1 攻撃面: encryption_state を 0b01→0b00 に flip すると envelope bytes が cleartext として返り、AEAD 検証を完全 bypass、生 ciphertext+tag+nonce が caller に届く fail-open。 修正:
false-positive bound: 偶然の cleartext payload が EnvelopeVersionV1 で始まり / EnvelopeOverhead 以上 / DecodeEnvelope を通過 / 32-bit key_id collision まで揃う必要があり、結合確率は低い。Stage 8 の MVCC metadata 認証 bit で deterministic 化される。 caller audit:
新規テスト:
make test/lint green。 |
|
Claude finished @bootjp's task in 3m 5s —— View job PR742 Round-4 Review
Codex P1 rebadge guard — Correct ✅The encState 0b01→0b00 flip is now closed by This also closes the round-2 "design-accepted" concern (Option A from round-2 review). That finding was previously marked as design-accepted per §2.2 because including flags in the AAD required a design-doc amendment; the rebadge guard is an alternative mechanical fix that covers the same flip without modifying the AAD layout. Both paths are now blocked.
|
| Finding | Status |
|---|---|
Codex P1: encState-flip rebadge attack (decryptForKey) |
✅ Fixed in round-4 |
| Round-2 design-accepted: encState-flip → raw envelope bytes | ✅ Also closed by rejectRebadgedEnvelope |
Cipher.HasKey nil/zero-value safety |
✅ Correct |
rejectRebadgedEnvelope guard logic |
✅ Correct |
| False-positive boundary analysis | ✅ Bounded and documented; Stage 8 makes it deterministic |
nolint:nilerr suppression |
✅ Justified and commented |
TestCipher_HasKey (4 cases) |
✅ Complete |
TestEncryption_RebadgeAttackRejected |
✅ Correct |
Round-3 P2: isVisibleLiveKey unauthenticated header in scanDeletePrefix |
The read path (GetAt, scan, GetRange) is now fully protected through all four rounds. isVisibleLiveKey is the one remaining gap before the encryption envelope can be considered complete for all write-path visibility decisions.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: eb7edd4bc6
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| if sv.ExpireAt != 0 && sv.ExpireAt <= ts { | ||
| return nil, ErrKeyNotFound | ||
| } | ||
| return plain, nil |
There was a problem hiding this comment.
Preserve empty-value existence on decrypted reads
Return-value semantics changed in readVisibleVersion: decrypted bytes are returned directly, but AES-GCM Open can return nil for a valid zero-length plaintext. In the encryption-enabled path this makes GetAt return nil, nil for empty values, and ExistsAt then reports the key as absent because it checks val != nil. This regresses correctness for any workload that stores empty values and relies on ExistsAt (e.g., key-existence checks after writing an empty payload).
Useful? React with 👍 / 👎.
| if !s.cipher.HasKey(env.KeyID) { | ||
| // Body parses but the embedded key_id is not loaded — the | ||
| // joint shape+keystore-collision probability is low enough | ||
| // to treat this as legitimate cleartext that happens to | ||
| // look envelope-shaped. | ||
| return nil |
There was a problem hiding this comment.
Reject rebadged envelopes even when key_id is altered
The cleartext rebadge guard is bypassable because it only errors when the parsed envelope key_id is currently loaded. A disk attacker can flip encryption_state to cleartext and also modify the envelope’s key_id bytes to any unloaded value; rejectRebadgedEnvelope then returns nil, and the read path serves raw envelope bytes as plaintext instead of surfacing integrity failure. This still violates the fail-closed tamper-detection goal for encrypted-at-rest rows.
Useful? React with 👍 / 👎.
…x P1 round-4)
Two new P1 codex findings on the round-3 commit, both legitimate:
1. Empty-plaintext semantics regression. AES-GCM Open returns a nil
dst slice for a zero-length plaintext, and ExistsAt distinguishes
"key absent" from "key present with empty value" via val != nil.
In the encryption-enabled path GetAt was therefore returning
(nil, nil) for empty values and ExistsAt reported the key as
missing. Normalize to []byte{} on the decrypt success path so
the empty-value contract matches the cleartext fast path.
2. Rebadge guard bypassable. The previous round's heuristic only
rejected when the embedded key_id was loaded in the live
keystore. A disk attacker who flips encryption_state to cleartext
can also rewrite the envelope's key_id bytes to any unloaded
value, falling out of the keystore lookup and serving the
relabeled bytes as plaintext. The strengthened guard rejects
whenever DecodeEnvelope succeeds — independent of key_id —
accepting the trade-off that legitimate cleartext payloads
parsing as well-formed envelopes (extremely rare under the
length/version/flag/nonce constraints) will also fail closed.
Stage 8's authenticated MVCC metadata bit replaces this
heuristic with a deterministic check.
Caller audit (semantic change: tightened cleartext rejection):
- decryptForKey is the only caller of rejectRebadgedEnvelope; the
cleartext fast path otherwise is unchanged.
- The cipher=nil short-circuit is preserved so legacy single-mode
deployments are not affected.
- Existing tests (TestEncryption_InactiveKeyWritesCleartext +
others) write short cleartext bodies that decode-envelope
rejects on length, so the strict guard does not break them.
Cipher.HasKey was added in round-3 specifically to power the
keystore-collision narrow check; with that check removed it has no
production caller, so it is removed too rather than left as
dead-code surface.
New regression tests:
- TestEncryption_RebadgeAttackRejected gains the
"encState flipped AND key_id rewritten to unloaded" case so a
future regression of the bypass surface fails the suite.
- TestEncryption_EmptyValueExistsAt locks down the empty-value
Put → GetAt → ExistsAt = true contract under encryption.
|
@claude review PR742 round-5: codex P1 x2 を反映しました (commit 1bda846)。 P1-A: Empty-plaintext regression (
P1-B: Rebadge guard bypass (
caller audit:
新規テスト:
make test/lint green。 |
|
Claude finished @bootjp's task in 2m 59s —— View job PR742 Round-5 Review
P1-A: Empty-plaintext nil normalization — Correct ✅
The trace for an empty encrypted value is now correct:
P1-B:
|
| Finding | Status |
|---|---|
P1-A: Empty-plaintext nil normalization (decryptForKey) |
✅ Fixed in round-5 |
| P1-B: Rebadge guard key_id-rewrite bypass | ✅ Fixed in round-5 |
Cipher.HasKey dead-code removal |
✅ Clean |
TestEncryption_EmptyValueExistsAt |
✅ Correct; pins GetAt non-nil + ExistsAt=true |
TestEncryption_RebadgeAttackRejected (table-driven, 2 cases) |
✅ Correct |
| Rounds 1–5 read path (GetAt, scan, rebadge, empty-value) | ✅ Fully protected |
Round-3/4 P2: isVisibleLiveKey unauthenticated header in scanDeletePrefix |
The read path and the encState rebadge family are now fully protected across all five rounds. isVisibleLiveKey is the sole remaining gap before the storage envelope can be considered complete for all visibility-decision paths.
…ude[bot] round-5)
Read-side paths (readVisibleVersion, processFoundValue) gained
value-header AAD authentication in rounds 3-5, but the write-side
isVisibleLiveKey was missed. Its sole caller is scanDeletePrefix,
which uses the bool to decide whether DeletePrefixAt needs to
write a fresh tombstone. A disk attacker who flips the tombstone
bit (or lowers expireAt) on an encrypted entry would cause the
function to return needsTombstone=false on a key that is actually
still live — DeletePrefixAt then skips writing the deletion
tombstone and the key survives the prefix delete silently. Unlike
the read-path flips, this is a persisted integrity bypass: the
missing tombstone stays missing across restarts.
Fix: run decryptForKey inside isVisibleLiveKey when sv.EncState is
encrypted. The plaintext is discarded; we only need the
authentication side-effect over the (envelope header ‖
value-header bytes ‖ pebble key) AAD. After Decrypt succeeds,
sv.Tombstone and sv.ExpireAt are now operating on authenticated
bytes; on tag mismatch ErrEncryptedReadIntegrity propagates
through scanDeletePrefix to the operator, halting DeletePrefixAt
fail-closed.
Caller audit:
- isVisibleLiveKey is called only by scanDeletePrefix (lsm_store.go:1356).
- scanDeletePrefix already propagates errors via `return err`, so
the new ErrEncryptedReadIntegrity surface bubbles to the
operator without further changes.
- Cleartext entries (sv.EncState == encStateCleartext) hit
decryptForKey's fast path and only run the rebadge guard, which
rejects envelope-shaped bytes. Legitimate cleartext stays
unaffected.
- Empty-plaintext encrypted entries: decryptForKey now returns
[]byte{}; we discard it, so the round-5 normalization is
harmless here.
New regression test
TestEncryption_DeletePrefixHeaderTamperRejected writes an encrypted
entry under "doomed/", flips the tombstone bit on disk, calls
DeletePrefixAt, and asserts the surface error is
ErrEncryptedReadIntegrity instead of a silent no-op.
|
@claude review PR742 round-6: claude[bot] round-5 review で指摘された 攻撃面: round 3-5 で read paths ( 修正: caller audit:
新規テスト
make test/lint green。 |
|
Claude finished @bootjp's task in 2m 19s —— View job PR742 Round-6 Review
Round-3/4/5 P2:
|
| Site | Function | Fixed in |
|---|---|---|
| lsm_store.go:730 | readVisibleVersion |
Round-3 |
| lsm_store.go:770 | processFoundValue |
Round-3 |
| lsm_store.go:1407 | isVisibleLiveKey |
Round-6 |
The remaining decodeValue call (line 1748, setEncodedVersionInBatch) uses version.Tombstone from an in-memory VersionedValue, not from on-disk bytes — it is not a tamper target. ✅
Cumulative summary — All rounds
| Finding | Status |
|---|---|
| Gemini medium ×2: AAD double-allocation | ✅ Fixed (round-1) |
| Codex P1: value-header (tombstone/expireAt) not in AAD | ✅ Fixed (round-3) |
Read path readVisibleVersion: decrypt-before-visibility |
✅ Fixed (round-3) |
Read path processFoundValue: decrypt-before-visibility |
✅ Fixed (round-3) |
| Codex P1: encState-flip rebadge → raw envelope as plaintext | ✅ Fixed (round-3/4) |
| Codex P1: key_id-rewrite bypass on rebadge guard | ✅ Fixed (round-5) |
Empty-value nil → ExistsAt false negative |
✅ Fixed (round-5) |
Write-side isVisibleLiveKey / scanDeletePrefix gap |
✅ Fixed (round-6) |
CounterNonceFactory filename / export visibility |
Minor nit — encryption_test_helpers.go is not a _test.go file so CounterNonceFactory is exported from the store package. Still a nit, not a blocker. |
The storage envelope wiring is now security-complete across all read and write visibility-decision paths. No remaining gaps identified.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: f8fc68e8da
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| return errors.Wrap(ErrEncryptedReadIntegrity, | ||
| "store: cleartext-labelled value parses as an envelope; refusing rebadge attempt") |
There was a problem hiding this comment.
Avoid rejecting cleartext values that match envelope shape
This branch turns any cleartext value that DecodeEnvelope can parse into ErrEncryptedReadIntegrity, which makes legitimate rows unreadable once encryption is wired. In this commit, DecodeEnvelope only requires len(body) >= 34 and body[0] == 0x01, so binary/plain values written during the phase where activeKeyID is false (or pre-encryption legacy data) can be rejected even though they were never encrypted. That creates deterministic read failures for valid user data in mixed-mode rollout scenarios, not just attacker tampering.
Useful? React with 👍 / 👎.
…(PR742 codex P1 round-6)
Round-5's "reject any DecodeEnvelope-parseable body" guard was too
strict: legitimate cleartext rows whose bytes coincidentally pass
the length / version / flag / nonce shape checks (binary blobs,
images, certain protobuf payloads, anything with a 0x01 prefix and
≥34 bytes) became deterministically unreadable once a cipher was
wired. Codex round-6 flagged this as a P1 deterministic read
failure for valid user data in mixed-mode rollout.
The earlier round-3 "shape AND key_id loaded" check had the
opposite problem (codex round-4 P1: bypassable by also rewriting
the on-disk key_id). The two findings appear contradictory but are
reconciled by an AEAD trial decrypt: rebuild the AAD as if the row
carried encState=encrypted and call cipher.Decrypt. Only a real
envelope produced under one of the loaded DEKs verifies — random
cleartext (even envelope-shaped) has 2⁻¹²⁸ chance of forging a tag.
To handle the kid-rewrite-to-unloaded variant (round-4 P1),
iterate over all loaded DEKs and substitute each candidate kid in
the AAD's HeaderAAD section. This way a relabeled envelope whose
on-disk key_id was rewritten still verifies under the original
encrypt-time kid. The Tombstone / ExpireAt / pebbleKey AAD
components remain bound from the encrypt path, so any flip there
trips the tag mismatch regardless of which DEK we trial.
Cost: O(loaded_DEKs) AEAD decrypt attempts per cleartext read.
Typical clusters carry 1-3 active DEKs; each Decrypt is a few µs.
Negligible relative to the Pebble I/O the read already performs.
Re-add Cipher.LoadedKeyIDs accessor (round-3 had Cipher.HasKey,
removed in round-5; the new guard needs to enumerate). Mirrors
the existing Keystore zero-value contract — returns nil for a
nil/zero-value receiver.
Caller audit:
- rejectRebadgedEnvelope is the only LoadedKeyIDs caller.
- Signature changes from (body) to (pebbleKey, sv, body) so the
AAD reconstruction can use the on-disk Tombstone / ExpireAt;
only call site is decryptForKey's cleartext branch.
- cipher=nil short-circuit, "body too short for envelope", and
"DecodeEnvelope parse fail" fast paths preserved.
New regression test
TestEncryption_RebadgeGuardAllowsLegitimateEnvelopeShapedCleartext
constructs a cleartext payload that round-5 would have rejected
(0x01 prefix + 64 bytes that pass DecodeEnvelope's structural
checks) and confirms PutAt → GetAt round-trips unchanged.
The pre-existing TestEncryption_RebadgeAttackRejected covers both
attack variants (encState flip alone; encState flip + key_id
rewrite to unloaded) and continues to pass.
|
@claude review PR742 round-7: codex P1 矛盾する 2 つの過去指摘:
解決: shape-only reject を AEAD trial decrypt に置換。
追加 API: caller audit:
コスト: cleartext read あたり O(loaded_DEKs) AEAD Decrypt。typical 1-3 DEK で μs 級、Pebble I/O 比で無視可能。 新規テスト:
make test/lint green。 |
|
Claude finished @bootjp's task in 6m 30s —— View job PR742 Round-7 Review
AEAD trial decrypt (
|
| Finding | Status |
|---|---|
| Gemini medium ×2: AAD double-allocation | ✅ Round-1 |
| Codex P1: value-header not bound into AAD | ✅ Round-3 |
readVisibleVersion / processFoundValue: decrypt-before-visibility |
✅ Round-3 |
| Codex P1: encState-flip rebadge → raw ciphertext as plaintext | ✅ Round-4 |
| Codex P1: key_id-rewrite bypasses keystore collision guard | ✅ Round-5 |
Empty-value nil → ExistsAt false negative |
✅ Round-5 |
Write-side isVisibleLiveKey / scanDeletePrefix gap |
✅ Round-6 |
| Codex P1: shape-only reject → false-positives on legitimate cleartext | ✅ Round-7 |
Stale decryptForKey comment (still describes round-4/5 mechanism) |
|
Ambiguous "falls through" phrase in rejectRebadgedEnvelope comment |
|
No direct unit test for Cipher.LoadedKeyIDs() |
|
| Retired-DEK rebadge gap | |
CounterNonceFactory exported from non-test file |
All security-critical paths are now protected. The two stale comments are the only actionable items before merge; both are documentation-only fixes.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 0f9fbda281
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| var hdr [valueHeaderSize]byte | ||
| writeValueHeaderBytes(hdr[:], sv.Tombstone, sv.ExpireAt, encStateEncrypted) | ||
| for _, kid := range s.cipher.LoadedKeyIDs() { | ||
| aad := buildStorageAAD(env.Version, env.Flag, kid, hdr[:], pebbleKey) | ||
| if _, err := s.cipher.Decrypt(env.Body, aad, kid, env.Nonce[:]); err == nil { |
There was a problem hiding this comment.
Reject relabeled envelopes when header fields are modified
rejectRebadgedEnvelope rebuilds trial-decrypt AAD from sv.Tombstone/sv.ExpireAt, which come from the attacker-controlled cleartext-labeled header. If an attacker flips encryption_state to cleartext and changes tombstone/expiry bits, every trial decrypt sees the wrong AAD and fails, so this function returns nil and the read path serves ciphertext bytes (or silently returns not-found/expired) instead of ErrEncryptedReadIntegrity. This recreates a fail-open tamper path for encrypted rows under disk-byte modification.
Useful? React with 👍 / 👎.
…1 round-7)
Round-7's trial-decrypt rebadge guard reconstructed the AAD from
on-disk sv.Tombstone / sv.ExpireAt — but those fields live in the
same value-header byte / region as encryption_state, so an attacker
who flips encState 0b01→0b00 can simultaneously flip tombstone or
rewrite expireAt. The trial AAD then no longer matches the
encrypt-time AAD, every loaded-DEK Decrypt fails, the guard
returns nil, and the read path serves either ciphertext garbage
("not found" via the post-decryptForKey tombstone check, or
"expired" via expireAt) — a fail-open tamper path codex round-7
flagged P1.
Two changes close most of the gap:
1. Canonical tombstone in the trial AAD. The encrypt path always
writes tombstone=false (encrypted entries never carry the
tombstone bit — deletes use cleartext tombstone-only writes).
Trusting the on-disk tombstone bit was therefore unsound.
Substitute false unconditionally so a flipped tombstone bit no
longer changes the AAD that we trial against.
2. Enumerate expireAt candidates. We can't know the encrypt-time
value from the on-disk bytes, but the common case is no-TTL
(expireAt=0); attackers who rewrote expireAt away from 0 still
verify when we try expireAt=0 in the trial AAD. We also keep
the on-disk value as a candidate to cover the encState-only
flip case where expireAt was untouched. Cost: up to 2 × loaded_DEKs
AEAD Decrypt attempts per cleartext read — typical clusters are
1-3 active DEKs, each Decrypt is microseconds, negligible
relative to Pebble I/O.
Caller audit: rejectRebadgedEnvelope is still called only by
decryptForKey's cleartext branch. Signature unchanged. Cleartext
reads that don't parse as envelopes (the legitimate-mixed-mode
fast path codex round-6 required) are unaffected.
Residual gap: encState flip + expireAt rewrite when the original
expireAt was a non-zero value the attacker also rewrote. The trial
can't recover the original without authenticated metadata. Stage 8
moves encryption_state into authenticated MVCC metadata, which
makes the rebadge attack structurally impossible.
New regression test TestEncryption_RebadgeAttackCombinedHeaderFlips
covers three combined-flip scenarios on a no-TTL row: encState +
tombstone, encState + expireAt rewrite, encState + tombstone +
expireAt — all surface ErrEncryptedReadIntegrity.
|
@claude review PR742 round-8: codex P1 攻撃: round-7 trial decrypt が on-disk 修正:
コスト: cleartext read あたり最大 2 × loaded_DEKs trial decrypt。typical 1-3 DEK、μs 級、Pebble I/O 比で無視可。 caller audit:
残課題: encState + expireAt rewrite (原 expireAt が非ゼロの場合)。Stage 8 で encryption_state を authenticated MVCC metadata に移行することで deterministic に解決。 新規テスト
make test/lint green。 |
|
Claude finished @bootjp's task in 3m 52s —— View job PR742 Round-8 Review
Codex P1 combined-flip fix — Correct ✅The codex P1 finding was that round-7's Round-8 closes the common case via two changes: 1. Canonical tombstone=false ( writeValueHeaderBytes(hdr[:], false /*canonical*/, candidateExpire, encStateEncrypted)This is sound because 2. expireAt candidate enumeration ( candidateExpireAts := []uint64{sv.ExpireAt}
if sv.ExpireAt != 0 {
candidateExpireAts = append(candidateExpireAts, 0)
}This strategy is correct for the documented attack surface:
The cost (≤2 × loaded_DEKs AEAD Decrypt per cleartext read, μs-range) is proportionate and documented.
|
| Finding | Status |
|---|---|
| Gemini medium ×2: AAD double-allocation | ✅ Round-1 |
| Codex P1: value-header not bound into AAD | ✅ Round-3 |
readVisibleVersion / processFoundValue: decrypt-before-visibility |
✅ Round-3 |
| Codex P1: encState-flip rebadge → raw ciphertext as plaintext | ✅ Round-4 |
| Codex P1: key_id-rewrite bypasses keystore collision guard | ✅ Round-5 |
Empty-value nil → ExistsAt false negative |
✅ Round-5 |
Write-side isVisibleLiveKey / scanDeletePrefix gap |
✅ Round-6 |
| Codex P1: shape-only reject → false-positives on legitimate cleartext | ✅ Round-7 |
| Codex P1: combined encState + tombstone/expireAt flip → trial AAD mismatch | ✅ Round-8 |
| Residual: encState + expireAt rewrite when orig expireAt non-zero | |
Stale decryptForKey comment (lines 137–148, 151) |
|
rejectRebadgedEnvelope history missing round-8 entry |
|
| Misleading "kid-rewrite-to-loaded-other-DEK still falls through" | |
Prerequisite assertion missing in TestEncryption_RebadgeGuardAllowsLegitimateEnvelopeShapedCleartext |
The security-critical fix is correct and complete for the no-TTL case (the common path). The three stale comments are the only actionable items; the residual non-zero expireAt rewrite gap is correctly documented and structurally deferred.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 55bca7c97b
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| encoded, err := env.Encode() | ||
| if err != nil { | ||
| return nil, 0, errors.Wrap(err, "store: encode envelope") | ||
| } | ||
| return encoded, encStateEncrypted, nil |
There was a problem hiding this comment.
Enforce snapshot-size limit after envelope encoding
Encrypted writes can exceed the restore-time value length cap even when validateValueSize succeeds: encryptForKey adds envelope overhead and returns bytes that are stored as-is, but restore still caps each raw value at maxSnapshotValueSize + valueHeaderSize in readRestoreEntry. With encryption enabled, a plaintext near maxSnapshotValueSize is writable yet its native snapshot restore fails with ErrValueTooLarge, so the system can persist data that cannot be recovered via snapshot. Please either validate post-encryption size here or raise the restore bound for encrypted rows.
Useful? React with 👍 / 👎.
…dex P1 round-8) validateValueSize accepts plaintexts up to maxSnapshotValueSize, but encryptForKey adds 34 bytes (EnvelopeOverhead) on the storage envelope path, and Pebble snapshots ship the resulting bytes verbatim. The restore-side cap (maxSnapshotValueSize + valueHeaderSize) was therefore 34 bytes short of the largest encrypted entry the write path can produce — a plaintext written exactly at maxSnapshotValueSize is persisted but rejected on snapshot restore with ErrValueTooLarge. Codex round-8 P1 flagged this as data that can persist yet not be recovered via snapshot. Raise readRestoreEntry's per-value cap to maxSnapshotValueSize + valueHeaderSize + EnvelopeOverhead so the restore bound matches what the encrypt-and-write path produces. The existing safety motivation (cap arbitrary stream lengths to prevent OOM on a malicious snapshot) is preserved — adding 34 bytes does not materially change the resource ceiling. Caller audit: this is a one-line bound change inside readRestoreEntry. The callers (Restore, restoreFromStreamingMVCC) already propagate ErrValueTooLarge, so behaviour is identical unless the value would have rejected before — and now the encrypted-at-max-plaintext case round-trips. mvcc_store.go's identical-shape cap at line 921 is for the in-memory MVCC stream format, which Stage 2 does not yet ship encrypted bodies through (see setEncodedVersionInBatch — it always restores as cleartext); that cap is therefore left for Stage 8 to bump together with the v3 metadata-side encryption_state. New regression test TestEncryption_SnapshotRestoreAtMaxValueSize shrinks maxSnapshotValueSize to 4 KiB (avoids allocating 256 MiB in CI) and round-trips a plaintext at that exact size through PutAt → Snapshot.WriteTo → fresh-store Restore → GetAt. NOT t.Parallel — the test mutates a package-level var the rest of the suite reads.
|
@claude review PR742 round-9: codex P1 Gap: 修正: caller audit:
新規テスト
make test/lint green。 |
|
Claude finished @bootjp's task in 4m 5s —— View job PR742 Round-9 Review
Codex P1: Snapshot restore cap — Fixed ✅The gap is real and the fix is minimal and correct. The bug. The fix (lsm_store.go:1691): vLen, err = readRestoreFieldLen(r, "snapshot value",
maxSnapshotValueSize+valueHeaderSize+encryption.EnvelopeOverhead)The cap arithmetic is now exact:
OOM protection preserved. The ceiling changes from
|
| Finding | Status |
|---|---|
| Gemini medium ×2: AAD double-allocation | ✅ Round-1 |
| Codex P1: value-header not bound into AAD | ✅ Round-3 |
readVisibleVersion / processFoundValue: decrypt-before-visibility |
✅ Round-3 |
| Codex P1: encState-flip rebadge → raw ciphertext as plaintext | ✅ Round-4 |
| Codex P1: key_id-rewrite bypasses keystore collision guard | ✅ Round-5 |
Empty-value nil → ExistsAt false negative |
✅ Round-5 |
Write-side isVisibleLiveKey / scanDeletePrefix gap |
✅ Round-6 |
| Codex P1: shape-only reject → false-positives on legitimate cleartext | ✅ Round-7 |
| Codex P1: combined encState + tombstone/expireAt flip → trial AAD mismatch | ✅ Round-8 |
| Codex P1: Snapshot restore cap 34 bytes short for encrypted entries | ✅ Round-9 |
| Residual: encState + expireAt rewrite when orig expireAt non-zero | |
Stale decryptForKey comment (lines 137–148, 151) |
|
rejectRebadgedEnvelope history missing round-8 entry + misleading "falls through" |
|
Prerequisite assertion missing in TestEncryption_RebadgeGuardAllowsLegitimateEnvelopeShapedCleartext |
|
CounterNonceFactory exported from non-test file |
All security-critical paths are now fully protected across all nine rounds. The storage envelope wiring is correct for reads (GetAt, scan), writes (PutAt, ExpireAt, ApplyMutations), the delete-prefix visibility path, and snapshot save/restore. The remaining items are documentation-only nits that do not affect correctness or security. The PR is ready for merge subject to CI green.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 009e798dd6
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| if err != nil { | ||
| // Body does not parse as an envelope; treat as legitimate | ||
| // cleartext. | ||
| return nil //nolint:nilerr // intentional: parse failure means "not an envelope" |
There was a problem hiding this comment.
Fail closed when relabeled envelope decoding fails
rejectRebadgedEnvelope treats any DecodeEnvelope failure as legitimate cleartext, so an attacker can flip encryption_state to cleartext and corrupt one envelope header byte (for example version) to force this branch and bypass integrity checks. In that case decryptForKey returns raw ciphertext bytes instead of ErrEncryptedReadIntegrity, which is a fail-open path for encrypted-at-rest data because KV values are opaque and callers may accept those bytes as valid application data.
Useful? React with 👍 / 👎.
…und-9) Round-9's rebadge guard called encryption.DecodeEnvelope on the cleartext body; on parse failure it short-circuited to "treat as cleartext". An attacker who flips encryption_state to cleartext can also corrupt the envelope's version (or flag) byte to force DecodeEnvelope into that error path, ducking the trial decrypt and letting raw envelope bytes flow back to the caller as plaintext — a fail-open path codex round-9 flagged P1. Replace DecodeEnvelope with manual fixed-offset slicing: nonce = body[HeaderAADSize : HeaderSize] // bytes 6..18 ct = body[HeaderSize:] // bytes 18.. The trial AAD is now built with canonical envelope_version (0x01) and flag (0) — both fixed by the encrypt path — instead of the on-disk bytes. So a corrupted version or flag byte no longer matters; only ciphertext / tag bytes can break the trial, and at that point the original envelope is destroyed (the user receives garbage, not the original plaintext). Combined with rounds 7-8 (key_id substitution, canonical tombstone, expireAt enumeration), the guard now catches: - encState flip alone (round-3 case) - encState + tombstone flip (round-7) - encState + expireAt rewrite when original was 0 (round-7) - encState + tombstone + expireAt all flipped on no-TTL row (round-7) - encState + envelope version byte corrupted (this round) - encState + envelope flag byte corrupted (this round) - encState + version + flag both corrupted (this round) - encState + key_id rewritten to unloaded (round-4) Residual: ciphertext or tag byte tampering (no integrity surface unless we add authenticated metadata, which Stage 8 does deterministically) and expireAt rewrite when the original was a specific non-zero value the attacker also rewrote. Caller audit: rejectRebadgedEnvelope is still called only by decryptForKey's cleartext branch. Signature unchanged. The cipher=nil and len(body) < EnvelopeOverhead short-circuits are preserved so legitimate cleartext smaller than an envelope or in single-mode deployments hit the fast path unchanged. New regression test TestEncryption_RebadgeAttackEnvelopeHeaderCorruption covers three sub-cases (version corrupted, flag corrupted, both corrupted) and locks in the fail-closed behaviour against the DecodeEnvelope-fail-then-fall-through bypass.
|
@claude review PR742 round-10: codex P1 Gap: round-9 の rebadge guard が 修正:
Trial AAD は canonical envelope_version=0x01 / flag=0 (encrypt path の fixed values) を使用。version/flag byte 改変は AAD に影響しない (canonical を使うため)。 catch できる variant:
残課題: ciphertext / tag byte 改変 (元 envelope 破壊と引き換え、user は garbage 受領)、expireAt rewrite (元値 non-zero の場合)。Stage 8 metadata 認証で deterministic 化。 caller audit:
新規テスト make test/lint green。 |
|
Claude finished @bootjp's task in 3m 42s —— View job PR742 Round-10 Review
Codex P1:
|
| Attack variant | Caught by |
|---|---|
| encState flip alone | Round-7 (candidate DEK trial) |
| encState + tombstone flip | Round-8 (canonical tombstone=false in trial AAD) |
| encState + expireAt rewrite (orig=0) | Round-8 (candidateExpireAts includes 0) |
| encState + tombstone + expireAt all flip | Round-8 |
| encState + key_id rewrite to unloaded | Round-7 (LoadedKeyIDs iteration, not trusting on-disk key_id) |
| encState + envelope version byte corrupted | Round-10 (bypass DecodeEnvelope, use fixed offsets) |
| encState + envelope flag byte corrupted | Round-10 (canonical flag=0 in trial AAD) |
| encState + version AND flag corrupted | Round-10 |
Residual (documented, deferred to Stage 8):
- encState flip + nonce bytes corrupted → trial uses wrong nonce → GCM check fails for all candidates → guard returns nil → caller gets garbled bytes (not original plaintext). Ciphertext corruption is the attacker's loss — the original plaintext is unrecoverable.
- encState flip + expireAt rewrite when original expireAt was non-zero.
TestEncryption_RebadgeAttackEnvelopeHeaderCorruption — Correct ✅
All three cases are correctly structured:
- Case 1 (version corrupted):
raw[valueHeaderSize] = 0x07—body[0](version) becomes0x07. With old code,DecodeEnvelopewould fail on0x07 != 0x01. With new code,nonce = body[6:18](still correct),ct = body[18:](still correct), canonical AAD uses0x01→ GCM verifies →ErrEncryptedReadIntegrity. ✅ - Case 2 (flag corrupted):
raw[valueHeaderSize+1] = 0xff—body[1](flag) becomes0xff. Old code would passDecodeEnvelope(flag is not validated) butenv.Flag = 0xffwould be used in the trial AAD instead of0x00→ AAD mismatch → trial decrypt fails → guard returns nil. New code uses canonicalflag=0→ correct trial AAD → GCM verifies →ErrEncryptedReadIntegrity. ✅ - Case 3 (both corrupted): Combination of the above. ✅
Note about case 2 (flag corruption with old round-9 code): This was a silent bypass in round-9 even without the DecodeEnvelope path, because env.Flag from the parsed envelope was used in the trial AAD. An attacker who corrupted body[1] (flag) from 0x00 to 0xff would cause env.Flag = 0xff → trial AAD used flag=0xff instead of the encrypt-time flag=0x00 → GCM verification failed for all candidates → guard returned nil. Round-10 closes this by canonicalizing flag=0 in the trial AAD. The commit message lists flag corruption as "round-9 new" which is accurate. ✅
decryptForKey stale comment — Still not addressed ⚠️
The block comment at lines 137–148 still describes the round-3 mechanism ("if it parses as a well-formed envelope AND its key_id is loaded in the keystore, reject as ErrEncryptedReadIntegrity") which has been superseded by the round-7/8/9/10 AEAD trial decrypt approach. This was flagged in rounds 7, 8, and 9 and remains unchanged.
Additionally, line 151 states: "its Tombstone, ExpireAt, and EncState are reproduced into the AAD" — this is misleading for the cleartext rebadge path (where tombstone is canonicalized to false and encState to encStateEncrypted in the trial AAD), though accurate for the encrypted path.
The rejectRebadgedEnvelope history section (lines 218–224) still says "round-7 (this change)" with no round-8, 9, or 10 entries. The misleading "kid-rewrite-to-loaded-other-DEK variant still falls through" phrase (which only applies when the original DEK has been retired) is also still present, unchanged since round-7.
These are documentation-only issues with no correctness impact. Since they've persisted across four rounds, it may be simplest to address them in a single documentation cleanup commit before merge.
Cumulative summary — All rounds
| Finding | Status |
|---|---|
| Gemini medium ×2: AAD double-allocation | ✅ Round-1 |
| Codex P1: value-header not bound into AAD | ✅ Round-3 |
readVisibleVersion / processFoundValue: decrypt-before-visibility |
✅ Round-3 |
| Codex P1: encState-flip rebadge → raw ciphertext as plaintext | ✅ Round-4 |
| Codex P1: key_id-rewrite bypasses keystore collision guard | ✅ Round-5 |
Empty-value nil → ExistsAt false negative |
✅ Round-5 |
Write-side isVisibleLiveKey / scanDeletePrefix gap |
✅ Round-6 |
| Codex P1: shape-only reject → false-positives on legitimate cleartext | ✅ Round-7 |
| Codex P1: combined encState + tombstone/expireAt flip → trial AAD mismatch | ✅ Round-8 |
| Codex P1: snapshot restore cap 34 bytes short for encrypted entries | ✅ Round-9 |
Codex P1: DecodeEnvelope fail → cleartext fall-through (version/flag corruption) |
✅ Round-10 |
| Residual: encState + nonce/ciphertext corruption → garbled bytes (not original plaintext) | |
| Residual: encState + expireAt rewrite when orig expireAt non-zero | |
Stale decryptForKey comment (lines 137–148) + misleading line 151 |
|
rejectRebadgedEnvelope history section out of date (round-7 as latest) + misleading "falls through" |
|
Prerequisite assertion missing in TestEncryption_RebadgeGuardAllowsLegitimateEnvelopeShapedCleartext |
|
CounterNonceFactory exported from non-test file |
All security-critical paths are now fully protected across all ten rounds. The storage envelope wiring is correct for reads (GetAt, scan), writes (PutAt, ExpireAt, ApplyMutations), the delete-prefix visibility path, and snapshot save/restore. The remaining items are all documentation-only nits. The PR is ready for merge subject to CI green.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: b4da4e7d55
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| // No (DEK, candidate-expireAt) combination produces a tag match. | ||
| // Body is either legitimate cleartext that happens to look | ||
| // envelope-shaped, an envelope under a retired DEK, or a | ||
| // partially-tampered envelope (residual gap above). The first | ||
| // case is the legitimate-mixed-mode outcome the round-6 codex |
There was a problem hiding this comment.
Fail closed on relabeled envelopes with body tampering
rejectRebadgedEnvelope only raises ErrEncryptedReadIntegrity when a trial decrypt succeeds, and otherwise returns nil as cleartext. If an at-rest attacker flips encryption_state from encrypted to cleartext and corrupts any ciphertext/tag byte (body[18:]), every trial decrypt fails with integrity errors, this function falls through, and decryptForKey returns attacker-controlled envelope bytes as plaintext instead of surfacing an integrity failure. This leaves a fail-open tamper path for encrypted rows under the same disk-modification threat model this guard is meant to block.
Useful? React with 👍 / 👎.
… P1 round-10)
Codex round-10 re-flagged the ciphertext/tag-byte tampering case as
a fail-open path. The case was already documented as residual in
round-9's commit (and round-7's code comment, but only briefly.
Expand the in-code rationale to be explicit about:
- which residuals exist (ct/tag tamper, non-zero expireAt rewrite)
- why no further Stage 2 patch closes them (encryption_state
itself is unauthenticated; the AAD reconstruction depends on
encState to dispatch, so it cannot also bind it)
- what the user observes (garbage bytes, not original plaintext —
confidentiality is intact, integrity observability is the gap)
- which variants the guard does catch
- how Stage 8 closes the gap deterministically (move
encryption_state into authenticated MVCC metadata)
No code change. Tests / lint still green.
EOF
)
|
@claude review PR742 round-11: codex P1 round-10 結論: Stage 2 の wire format における fundamental limit に到達。コード追加なし、residual-gap の rationale 強化のみ。 根拠:
Stage 2 がカバーする variant (catch 対象):
residual (Stage 2 では catch 不可):
user-observable impact:
Stage 8 deterministic closure: コード上、この residual を明示する詳細 rationale comment を rejectRebadgedEnvelope に追加しました。次の codex review で同じ ct/tag tamper を P1 として再 flag した場合、それは Stage 8 まで持ち越すべき認識済み制約として扱う必要があります。 make test/lint green。 |
|
Claude finished @bootjp's task in 2m 19s —— View job PR742 Round-11 Review
Codex P1 round-10: ciphertext/tag-byte tamper — Fundamental limit confirmed ✅The round-11 decision to expand the rationale rather than add code is correct. The circular dependency argument is sound and worth spelling out explicitly:
This is a genuine circularity, not an oversight. Stage 8's plan to move User-observable impact in the residual case: the caller receives garbage bytes — the corrupted ciphertext content — not the original plaintext. The attacker does not hold the DEK and cannot recover the plaintext. This is an integrity observability gap (no Round-11 comment additions — Accurate ✅The new text at
One precision note on the catch-list wording (lines 283–287): the phrase "encState + value-header tamper" is slightly ambiguous — this could be read as covering all value-header flips, but the Four stale comments — Still not addressed
|
| Finding | Status |
|---|---|
| Gemini medium ×2: AAD double-allocation | ✅ Round-1 |
| Codex P1: value-header not bound into AAD | ✅ Round-3 |
readVisibleVersion / processFoundValue: decrypt-before-visibility |
✅ Round-3 |
| Codex P1: encState-flip rebadge → raw ciphertext as plaintext | ✅ Round-4 |
| Codex P1: key_id-rewrite bypasses keystore collision guard | ✅ Round-5 |
Empty-value nil → ExistsAt false negative |
✅ Round-5 |
Write-side isVisibleLiveKey / scanDeletePrefix gap |
✅ Round-6 |
| Codex P1: shape-only reject → false-positives on legitimate cleartext | ✅ Round-7 |
| Codex P1: combined encState + tombstone/expireAt flip → trial AAD mismatch | ✅ Round-8 |
| Codex P1: snapshot restore cap 34 bytes short for encrypted entries | ✅ Round-9 |
Codex P1: DecodeEnvelope fail → cleartext fall-through (version/flag corruption) |
✅ Round-10 |
| Codex P1: encState + ct/tag corruption → fundamental wire-format limit | ✅ Round-11: correctly identified as fundamental limit; rationale expanded |
| Residual: encState + ct/tag tamper → garbage bytes (not original plaintext) | |
| Residual: encState + non-zero expireAt rewrite → same class | |
Stale decryptForKey comment (lines 137–148, 151) |
|
rejectRebadgedEnvelope history section missing rounds 8–11 |
|
| Misleading "kid-rewrite-to-loaded-other-DEK still falls through" | |
Prerequisite assertion missing in TestEncryption_RebadgeGuardAllowsLegitimateEnvelopeShapedCleartext |
|
CounterNonceFactory exported from non-test file |
The security-critical analysis is complete and correct. Round-11's documentation-only conclusion is well-reasoned. The four stale comments in decryptForKey and rejectRebadgedEnvelope are the only actionable items before merge — they've persisted across four rounds and are most efficiently addressed together in a single cleanup commit.
CLAUDE.md guidance: comments should explain WHY a piece of code exists, not WHO requested it or in WHICH PR review round. Across PR #742's 11 review rounds the rebadge guard's comments accumulated provenance markers ("PR742 codex P1 round-3", "round-7 — this change", "rounds 6, 7, 9, 10", etc.) that describe a historical sequence of fixes rather than the current state of the code, and they rot every time a new round lands. Single cleanup commit: - decryptForKey doc: drop the "Cleartext-rebadge guard (PR742 codex P1, round-3)" block. That paragraph described the round-3 implementation (envelope-shape + key_id-loaded check) which has since been fully replaced by trial decryption; it reads like an accurate description of the code but no longer matches it. The current rebadge-guard rationale already lives on rejectRebadgedEnvelope, so the doc cross-reference is the only thing decryptForKey needs. - rejectRebadgedEnvelope doc: drop the "PR742 review history" block (round-3 / round-5 / round-7 narrative) and the inline "(round-7 …)" / "(round-8 …)" / "(round-9 …)" asides. Replace with a single rationale describing the canonical-AAD trial-decrypt design as it stands today and an explicit residual-gap section pointing at Stage 8. - decryptForKey body: drop "(PR742 codex P1 round-4)" from the nil-plaintext normalization comment. - buildStorageAAD doc: drop "PR742 round-1" / "round-2" references; keep the technical rationale (pre-sizing avoids re-allocation; value-header inclusion binds tombstone / encryption_state / expireAt). - lsm_store.go processFoundValue: drop "(PR742 codex P1 …)" aside on the decrypt-before-visibility-check rationale. - lsm_store.go readRestoreEntry: drop "codex P1 PR742 round-7 caught a case where" history and rephrase as a forward rationale. - cipher.go LoadedKeyIDs: drop "(PR742 codex P1 family)" from the use-case comment. - encryption_glue.go encryptForKey AAD doc: drop "(PR742 codex P1)" trailing reference. No code change. The technical rationale for every defence (header binding, key_id substitution, expireAt enumeration, canonical version/flag, residual-gap-deferred-to-Stage-8) is preserved verbatim — only the round-N timeline is gone. Tests / lint green.
Jepsen unit tests have been failing with persistent Maven artifact resolution errors (Could not find artifact org.clojure:clojure:1.7.0 etc.) on commit 20e6d09 across four manual re-runs. The previous commit on the same branch (2addfa1) ran the same workflow with no code-tree changes between (cleanup is comment-only) and passed. This empty commit triggers a fresh Lein resolution attempt against Maven Central / Clojars at a different time.
|
Note: Jepsen test failures are infra-side, not code-side CI test job has been failing with persistent Maven artifact resolution errors ( Verification: just triggered the same Jepsen workflow on main (run 25521941547) — main also fails with identical Maven resolution errors. Same workflow on the same branch passed at 17:12 UTC (commit 2addfa1) and earlier today; nothing in the cleanup commit ( This is GitHub Actions runner ↔ Maven Central / Clojars network/repository issue. Operator action: wait for upstream recovery, or pin Lein to a specific version that doesn't trigger resolution. build / golangci-lint / test (ubuntu-latest, the Go test suite) all pass. The integrity/security review (rounds 1-11) is unchanged. |
|
@claude review |
|
Claude finished @bootjp's task in 4m 35s —— View job PR742 Round-12 Review
Docs cleanup commit
|
| Path | Function | Status |
|---|---|---|
| GetAt / key lookup | readVisibleVersion |
✅ decrypt before tombstone/expireAt check |
| GetRange / scan | processFoundValue |
✅ decrypt before tombstone/expireAt check |
| DeletePrefixAt | isVisibleLiveKey |
✅ decrypt before tombstone/expireAt check (round-6) |
| Snapshot restore | readRestoreEntry cap |
✅ +34 bytes for EnvelopeOverhead (round-9) |
| Rebadge: encState flip alone | rejectRebadgedEnvelope |
✅ trial decrypt |
| Rebadge: + key_id rewrite | rejectRebadgedEnvelope |
✅ LoadedKeyIDs iteration, not trusting on-disk key_id |
| Rebadge: + tombstone/expireAt flip | rejectRebadgedEnvelope |
✅ canonical tombstone=false, expireAt enumeration |
| Rebadge: + version/flag byte corrupt | rejectRebadgedEnvelope |
✅ fixed-offset slicing, canonical version/flag in trial AAD |
| Rebadge: + ct/tag corruption | (residual) | |
| Empty-value nil → ExistsAt=false | decryptForKey |
✅ nil → []byte{} normalization |
One remaining minor inaccuracy in decryptForKey comment ⚠️
encryption_glue.go:138-142:
// sv is the storedValue freshly decoded from the on-disk bytes; its
// Tombstone, ExpireAt, and EncState are reproduced into the AAD so
// any flip on disk fails GCM verification.For the encrypted branch (encStateEncrypted, lines 157-160), this is accurate: writeValueHeaderBytes(hdr[:], sv.Tombstone, sv.ExpireAt, sv.EncState) reproduces all three fields from sv into the AAD.
For the cleartext rebadge guard path, tombstone is hardcoded false and EncState is hardcoded encStateEncrypted in the trial AAD — they are canonicalized, not reproduced from sv. The current paragraph structure (the preceding sentence already says "encState=cleartext returns the body verbatim after the envelope-rebadge guard below") means a careful reader will understand the "reproduced" statement applies to the encrypted branch, but the comment doesn't make this scope explicit.
A one-line scope qualifier would remove the ambiguity: "For the encrypted branch, sv's Tombstone, ExpireAt, and EncState are reproduced into the AAD…" This is the only residual inaccuracy from the round-8/11 nit family.
Residual-gap comment: nonce bytes omitted ⚠️
rejectRebadgedEnvelope, the residual-gap block says the trial falls through when the attacker corrupts "specifically body[HeaderSize:] (ciphertext / tag)". It omits the nonce region body[HeaderAADSize:HeaderSize] (bytes 6–17): if the attacker also flips nonce bytes on disk, the trial uses the corrupted nonce → all cipher.Decrypt calls return wrong results → guard returns nil → caller gets garbage bytes. The outcome is identical to the ciphertext/tag case (garbage, not original plaintext), so this is an integrity-observability gap in the same Stage 8 bucket — but the comment's "specifically" clause is incomplete.
Minor nits (unchanged from prior rounds)
TestEncryption_RebadgeGuardAllowsLegitimateEnvelopeShapedCleartext (lsm_store_encryption_test.go:735): The test constructs envelopeShaped bytes designed to pass DecodeEnvelope, but there is no assertion that they actually do. If the envelope format changes and the test payload no longer parses as an envelope, the test stops proving the "envelope-shaped cleartext is allowed through" invariant without a failure signal. Adding a guard at the start of the test (if _, err := encryption.DecodeEnvelope(envelopeShaped); err != nil { t.Fatalf(...) }) would pin the prerequisite. Nit — not a blocker.
CounterNonceFactory (encryption_test_helpers.go): Exported from a non-_test.go file, so it appears in go doc store as a public production API. The inline comment already explains the cross-package test use case. If reducing the public surface is desirable before Stage 7 ships the registry-backed factory, renaming the file to something that makes the test-grade status obvious (e.g. encryption_testing.go with a prominent doc comment) is the minimal change. Not a blocker.
Cumulative summary — All 12 rounds
| Finding | Status |
|---|---|
| Gemini medium ×2: AAD double-allocation | ✅ Round-1 |
| Codex P1: value-header not bound into AAD | ✅ Round-3 |
readVisibleVersion / processFoundValue: decrypt-before-visibility |
✅ Round-3 |
| Codex P1: encState-flip rebadge → raw ciphertext as plaintext | ✅ Round-4 |
| Codex P1: key_id-rewrite bypasses keystore collision guard | ✅ Round-5 |
| Empty-value nil → ExistsAt false negative | ✅ Round-5 |
Write-side isVisibleLiveKey / scanDeletePrefix gap |
✅ Round-6 |
| Codex P1: shape-only reject → false-positives on legitimate cleartext | ✅ Round-7 |
| Codex P1: combined encState + tombstone/expireAt flip → trial AAD mismatch | ✅ Round-8 |
| Codex P1: snapshot restore cap 34 bytes short | ✅ Round-9 |
| Codex P1: DecodeEnvelope fail → cleartext fall-through (version/flag) | ✅ Round-10 |
| Codex P1: ct/tag tamper → fundamental wire-format limit | ✅ Round-11: rationale expanded |
| Stale comments from rounds 7-11 | ✅ Round-12: all addressed in commit 20e6d09 |
| Residual: encState + ct/tag/nonce tamper → garbage bytes (not original plaintext) | |
| Residual: encState + non-zero expireAt rewrite | |
decryptForKey line 138: "reproduced" applies to encrypted branch only |
|
| Residual-gap comment: nonce region not listed alongside ct/tag | |
Missing prerequisite assertion in RebadgeGuardAllowsLegitimateEnvelopeShapedCleartext |
|
CounterNonceFactory exported from non-test file |
The PR is security-complete. All correctness and integrity paths are protected and tested. The two remaining
Summary
Stage 2 of the data-at-rest encryption rollout (design doc:
docs/design/2026_04_29_proposed_data_at_rest_encryption.md, §4.1). Wires the AES-256-GCM storage envelope into the pebble-backedMVCCStore. Encryption is opt-in via the newWithEncryptionoption, so every existingNewPebbleStorecaller is untouched.tombstone | encryption_state(2 bits) | reserved(5 bits)per design §7.1; reserved encryption_state values fail closed withErrEncryptedValueReservedState.HeaderAADBytes(...) ‖ encoded_pebble_key) defeats §4.1 case-2 cut-and-paste and case-3 MVCC version substitution; tag mismatches surface as typedErrEncryptedReadIntegrity.NonceFactoryinterface is the seam Stage 7 (writer registry) replaces; today'sCounterNonceFactorytest helper produces the design's deterministic 12-byte (node_id ‖ local_epoch ‖ write_count) shape.Out of scope (deferred)
CounterNonceFactory)Test plan
go test -race ./store/...round-trip + tamper + AAD record-binding + property test (rapid)storesuite green on the cipher=nil legacy pathgolangci-lintcleango test -race ./adapter/...(CI verifies; local 5-minute timeout flake observed independently of these changes — adapter does not consume the encryption wiring)make testfull sweep (CI)Self-review (CLAUDE.md 5 passes)
Cipheris already lock-free overatomic.Pointer[map];pebbleStoreadds three immutable fields set at construction;NonceFactorycontract documents caller responsibility for(node_id, local_epoch, write_count)uniqueness.encryptForKey; encrypted path is one GCM seal + 34 bytes overhead; rapid property test has 100+ encrypted PutAt/GetAt round-trips and stays well under the existing test wall budget.-race.