feat(encryption): sidecar crash-durable read/write (Stage 1)#722
feat(encryption): sidecar crash-durable read/write (Stage 1)#722
Conversation
|
@claude review |
|
Important Review skippedAuto reviews are disabled on base/target branches other than the default branch. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
Claude finished @bootjp's task in 3m 14s —— View job Code Review — PR #722
|
There was a problem hiding this comment.
Code Review
This pull request introduces the Sidecar management system for encryption keys, providing functionality to read and write the keys.json file using a crash-durable protocol. The implementation includes validation for key purposes and IDs, along with comprehensive unit tests. The review feedback suggests enhancing the validation logic to ensure active keys are present in the key map, avoiding side effects by not mutating input structs during write operations, and ensuring that errors from file closure are not silently ignored.
| func validateSidecar(sc *Sidecar) error { | ||
| for idStr, k := range sc.Keys { | ||
| if err := validateSidecarKey(idStr, k); err != nil { | ||
| return err | ||
| } | ||
| } | ||
| return nil | ||
| } |
There was a problem hiding this comment.
validateSidecar should verify that the Active keys (Storage and Raft) actually exist in the Keys map if they are non-zero. A sidecar with an active key ID that has no corresponding entry in the Keys map is in an inconsistent state.
func validateSidecar(sc *Sidecar) error {
for idStr, k := range sc.Keys {
if err := validateSidecarKey(idStr, k); err != nil {
return err
}
}
if sc.Active.Storage != ReservedKeyID {
if _, ok := sc.Keys[strconv.FormatUint(uint64(sc.Active.Storage), 10)]; !ok {
return pkgerrors.Errorf("encryption: active storage key %d not found in keys map", sc.Active.Storage)
}
}
if sc.Active.Raft != ReservedKeyID {
if _, ok := sc.Keys[strconv.FormatUint(uint64(sc.Active.Raft), 10)]; !ok {
return pkgerrors.Errorf("encryption: active raft key %d not found in keys map", sc.Active.Raft)
}
}
return nil
}| sc.Version = SidecarVersion | ||
| if err := validateSidecar(sc); err != nil { | ||
| return pkgerrors.Wrap(err, "encryption: validate before write") | ||
| } | ||
|
|
||
| data, err := json.MarshalIndent(sc, "", " ") | ||
| if err != nil { | ||
| return pkgerrors.Wrap(err, "encryption: marshal sidecar") | ||
| } |
There was a problem hiding this comment.
WriteSidecar mutates the Version field of the input Sidecar struct. This side effect might be unexpected for callers who reuse the struct. It is safer to perform this assignment on a local copy before marshaling.
| sc.Version = SidecarVersion | |
| if err := validateSidecar(sc); err != nil { | |
| return pkgerrors.Wrap(err, "encryption: validate before write") | |
| } | |
| data, err := json.MarshalIndent(sc, "", " ") | |
| if err != nil { | |
| return pkgerrors.Wrap(err, "encryption: marshal sidecar") | |
| } | |
| scCopy := *sc | |
| scCopy.Version = SidecarVersion | |
| if err := validateSidecar(&scCopy); err != nil { | |
| return pkgerrors.Wrap(err, "encryption: validate before write") | |
| } | |
| data, err := json.MarshalIndent(&scCopy, "", " ") | |
| if err != nil { | |
| return pkgerrors.Wrap(err, "encryption: marshal sidecar") | |
| } |
| func writeTmpAndFsync(tmpPath string, data []byte) error { | ||
| f, err := os.OpenFile(tmpPath, os.O_WRONLY|os.O_CREATE|os.O_TRUNC, sidecarFileMode) | ||
| if err != nil { | ||
| return pkgerrors.Wrapf(err, "encryption: open %q", tmpPath) | ||
| } | ||
| defer func() { _ = f.Close() }() | ||
|
|
||
| if _, err := f.Write(data); err != nil { | ||
| return pkgerrors.Wrapf(err, "encryption: write %q", tmpPath) | ||
| } | ||
| if err := f.Sync(); err != nil { | ||
| return pkgerrors.Wrapf(err, "encryption: fsync %q", tmpPath) | ||
| } | ||
| return nil | ||
| } |
There was a problem hiding this comment.
The error from f.Close() is silently ignored. According to the general rules, errors from Close() should be handled or logged to ensure resource leaks or cleanup issues are visible. Since this is a write operation, it is particularly important to ensure the file is closed successfully.
| func writeTmpAndFsync(tmpPath string, data []byte) error { | |
| f, err := os.OpenFile(tmpPath, os.O_WRONLY|os.O_CREATE|os.O_TRUNC, sidecarFileMode) | |
| if err != nil { | |
| return pkgerrors.Wrapf(err, "encryption: open %q", tmpPath) | |
| } | |
| defer func() { _ = f.Close() }() | |
| if _, err := f.Write(data); err != nil { | |
| return pkgerrors.Wrapf(err, "encryption: write %q", tmpPath) | |
| } | |
| if err := f.Sync(); err != nil { | |
| return pkgerrors.Wrapf(err, "encryption: fsync %q", tmpPath) | |
| } | |
| return nil | |
| } | |
| func writeTmpAndFsync(tmpPath string, data []byte) (err error) { | |
| f, err := os.OpenFile(tmpPath, os.O_WRONLY|os.O_CREATE|os.O_TRUNC, sidecarFileMode) | |
| if err != nil { | |
| return pkgerrors.Wrapf(err, "encryption: open %q", tmpPath) | |
| } | |
| defer func() { | |
| if closeErr := f.Close(); closeErr != nil && err == nil { | |
| err = pkgerrors.Wrapf(closeErr, "encryption: close %q", tmpPath) | |
| } | |
| }() | |
| if _, err := f.Write(data); err != nil { | |
| return pkgerrors.Wrapf(err, "encryption: write %q", tmpPath) | |
| } | |
| if err := f.Sync(); err != nil { | |
| return pkgerrors.Wrapf(err, "encryption: fsync %q", tmpPath) | |
| } | |
| return nil | |
| } |
References
- Do not silently ignore errors from Close() methods on resources like network connections. Log them to ensure resource leaks or other cleanup problems are visible.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: a1def5d849
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| // Helper exists so WriteSidecar does not nest its own defer-on-error | ||
| // inside the rename block. | ||
| func writeTmpAndFsync(tmpPath string, data []byte) error { | ||
| f, err := os.OpenFile(tmpPath, os.O_WRONLY|os.O_CREATE|os.O_TRUNC, sidecarFileMode) |
There was a problem hiding this comment.
Force restrictive mode on existing sidecar temp file
WriteSidecar opens <path>.tmp with os.O_CREATE|os.O_TRUNC, but the 0o600 mode is only applied when the file is newly created; if keys.json.tmp already exists (e.g., leftover from older tooling or manually created with broader perms), its permissive mode is retained and then moved into place by os.Rename, making keys.json group/other readable despite the security contract. This is reproducible with a preexisting keys.json.tmp at 0666, which yields a final sidecar mode of 0644 under a typical umask.
Useful? React with 👍 / 👎.
|
@claude review |
|
You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard. |
|
Claude finished @bootjp's task in 2m 28s —— View job Code Review — PR #722
|
| Issue | Reporter | Status |
|---|---|---|
Defer registered after writeTmpAndFsync → stale tmp on write-phase failure |
claude round-1 #1 | ✅ Fixed — defer moved above the call; TestWriteSidecar_StaleTmpFileIsCleanedOnWriteFailure added |
ErrUnsupportedFilesystem undefined |
claude round-1 #2 | ✅ Fixed — declared in errors.go:50; syncDir wraps raw fsync error with pkgerrors.WithSecondaryError(ErrUnsupportedFilesystem, err) |
No cross-validation of Active IDs against Keys map |
claude round-1 #3 / gemini line 141 | ✅ Fixed — requireActiveKey added to validateSidecar; ErrSidecarActiveKeyMissing declared; 3 tests added (Write × 2, Read × 1) |
WriteSidecar mutated caller's Version field |
gemini line 188 | ✅ Fixed — scCopy := *sc shallow copy; TestWriteSidecar_DoesNotMutateCaller pins the new contract |
f.Close() error silently ignored in writeTmpAndFsync |
gemini line 231 | ✅ Fixed — named-return retErr + defer; close errors propagate when write otherwise succeeded |
Pre-existing .tmp mode carried through via os.Rename |
codex P2 | ✅ Fixed — os.Remove then O_EXCL create; TestWriteSidecar_StaleTmpDoesNotLeakPermissiveMode pins the regression |
Remaining observations (non-blocking)
1. requireActiveKey does not validate purpose direction [Intentional Stage 5+ deferral]
A sidecar with Active.Storage = 1 where key "1" has Purpose = "raft" passes validateSidecar. The cross-validation added in this round checks existence, not semantic direction. This was noted in round-1 as an intentional boundary: purpose-direction matching belongs to the FSM layer in Stage 5+. No action needed now, but Stage 5+ must add the check before writing an active-key promotion.
2. TestReadSidecar_RejectsActiveKeyMissing covers only the Storage direction [Minor asymmetry]
TestWriteSidecar_RejectsActiveKeyMissing tests both Active.Storage and Active.Raft missing-entry cases on the write path (sidecar_test.go:382–411). The read-path counterpart at sidecar_test.go:413–430 only exercises Active.Storage = 12345 with an empty keys map; the Active.Raft direction is untested on read. The code path (requireActiveKey is called for both) is correct and symmetric — this is just a test coverage gap. A two-line addition to the existing read test would close it. Fix this →
3. syncDir close error is silently ignored [Acceptable]
syncDir defers _ = f.Close() on a read-only directory fd. Since no write-back is in flight, a close failure on a directory fd carries no durability implication. This is idiomatic and correct.
Verification of the crash-durable protocol
WriteSidecar (sidecar.go:200–243) now follows the §5.1 four-step sequence correctly:
1. scCopy = shallow copy; scCopy.Version = SidecarVersion; validateSidecar(&scCopy)
2. defer cleanup(tmpPath) registered ← BEFORE the following calls
3. writeTmpAndFsync(tmpPath, data) → O_EXCL create → Write → Sync → Close (error propagated)
4. os.Rename(tmpPath, path)
5. syncDir(dir) → open dir → Sync → Close
Step 2 is now registered before step 3 (the round-1 fix), so any failure at the write or fsync inside writeTmpAndFsync triggers cleanup. The os.Remove-then-O_EXCL pattern in writeTmpAndFsync (the codex P2 fix) ensures the inode is always fresh at 0o600. The syncDir error wraps ErrUnsupportedFilesystem with WithSecondaryError so errors.Is-matching works without string parsing at Stage 5+.
Reviewer checklist (PR description)
- Sidecar JSON matches §5.1 shape verbatim. Field names, JSON keys, and
Wrapped []byte→ base64 all confirmed. ✓ - Crash-durable protocol order confirmed. Write →
file.Sync→os.Rename→dir.Sync. ✓ - Validation applied at both Read and Write.
validateSidecar(includingrequireActiveKey) is called in both paths. ✓
Summary
Round 2 is clean. All round-1, gemini, and codex issues are resolved with appropriate tests. The two remaining observations are a test coverage asymmetry (Read-path Active.Raft direction) and an intentional Stage 5+ deferral. Neither is a blocker. This is ready to merge once the base feat/encryption-foundation branch lands.
|
@claude review |
|
Claude finished @bootjp's task in 2m 13s —— View job Code Review — PR #722
|
| Issue | Status |
|---|---|
TestReadSidecar_RejectsActiveKeyMissing missing Raft direction on read path |
✅ Fixed — 3 sub-cases now: storage active without entry, raft active without entry, both active without entries (sidecar_test.go:413–443) |
Fresh pass — no new issues found
The full three-file diff was reviewed. Everything below is a verification of known-good properties; nothing actionable is new.
Crash-durable protocol (WriteSidecar, lines 200–243)
The §5.1 four-step sequence is faithfully preserved:
- Shallow copy (
scCopy := *sc) so the caller's struct is not mutated —Versionis stamped on the copy only. validateSidecar(&scCopy)runs before marshalling, so a malformed sidecar cannot reach disk.- Defer for cleanup is registered before
writeTmpAndFsync(round-1 fix) — any write or fsync failure cleans up the tmp. writeTmpAndFsync→os.Rename→syncDirin the correct order.
writeTmpAndFsync (lines 262–286)
os.Remove(tmpPath)thenO_EXCLcreate (codex P2 fix) ensures the tmp inode is always fresh at0o600, regardless of any pre-existing permissive file.- Named-return
retErrpropagatesf.Closeerrors that occur after a successfulSync— not silently dropped. - Write-phase failures propagate to the caller's defer, which removes the (not-yet-existent) tmp cleanly via
_ = os.Remove.
syncDir (lines 296–308)
Wraps the raw fsync error with pkgerrors.WithSecondaryError(ErrUnsupportedFilesystem, err) so Stage 5+ startup can errors.Is-match on ErrUnsupportedFilesystem without string parsing.
validateSidecar / requireActiveKey (lines 136–165)
- Per-key format validated via
validateSidecarKey(decimal uint32, non-zero, known purpose). requireActiveKeychecks existence for bothActive.StorageandActive.Raftwhen non-zero.- Known Stage 5+ deferral (unchanged):
requireActiveKeydoes not verify that the referenced key'sPurposematches the expected direction (e.g.,Active.Storagepointing to a key withPurpose: "raft"passes). This is an intentional boundary — purpose-direction cross-validation belongs to the FSM layer at Stage 5+.
errors.go
All errors referenced in code and tests are defined: ErrUnsupportedFilesystem (line 50) and ErrSidecarActiveKeyMissing (line 57) added in round-1 fixes; all others from Stage 0 are present.
Test coverage summary
| Function | Sub-cases | Directions covered |
|---|---|---|
TestWriteSidecar_RejectsActiveKeyMissing |
2 | Storage ✓, Raft ✓ |
TestReadSidecar_RejectsActiveKeyMissing |
3 | Storage only ✓, Raft only ✓, Both ✓ |
TestWriteSidecar_StaleTmpDoesNotLeakPermissiveMode |
1 | Codex P2 regression ✓ |
TestWriteSidecar_StaleTmpFileIsCleanedOnWriteFailure |
1 | Pre-creation failure path ✓ |
TestWriteSidecar_DoesNotMutateCaller |
1 | Gemini caller-mutation fix ✓ |
Reviewer checklist (PR description)
- Sidecar JSON matches §5.1 shape verbatim. Field names, JSON keys,
Wrapped []byte→ base64. ✓ - Crash-durable protocol order confirmed. Write →
file.Sync→os.Rename→dir.Sync. ✓ - Validation applied at both Read and Write.
validateSidecar(includingrequireActiveKey) runs in both paths. ✓
Summary
Round 3 is clean. The read-path Raft-direction test gap from round 2 is closed. All round-1 (claude), gemini, and codex issues remain resolved. The only open item is the intentional Stage 5+ deferral (purpose-direction cross-validation in requireActiveKey), which is not a blocker.
This PR is ready to merge once the base feat/encryption-foundation (#719) lands on main.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 0320384045
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| if _, ok := sc.Keys[idStr]; !ok { | ||
| return pkgerrors.Wrapf(ErrSidecarActiveKeyMissing, | ||
| "active.%s=%d not present in keys map", purpose, id) | ||
| } |
There was a problem hiding this comment.
Reject active key IDs with mismatched purpose
validateSidecar only checks that active.storage/active.raft IDs exist in Keys, but it never verifies that the referenced entry has the matching purpose. A sidecar where active.storage points to a raft key (or vice versa) currently passes both WriteSidecar and ReadSidecar, even though this is malformed state and can route the wrong DEK to a purpose-specific encryption path after restart/rotation.
Useful? React with 👍 / 👎.
0320384 to
37a0bc1
Compare
|
@claude review PR722 round-4: codex P2
|
|
Claude finished @bootjp's task in 2m 25s —— View job Code Review — PR #722 Round 4
Round-3 open item resolutionThe one remaining observation from round 3 is fully closed in commit
Fresh pass — no new issues found
The change is minimal and correct: k, ok := sc.Keys[idStr]
if !ok {
return pkgerrors.Wrapf(ErrSidecarActiveKeyMissing, ...)
}
if k.Purpose != purpose {
return pkgerrors.Wrapf(ErrSidecarActivePurposeMismatch,
"active.%s=%d but keys[%q].purpose=%q", purpose, id, idStr, k.Purpose)
}Existence check comes first (preserving
Declared alongside
Four sub-cases across two axes:
The Read-path sub-tests correctly bypass Fixture consistency — Total test count — 24 top-level functions in Reviewer checklist (PR description)
SummaryRound 4 is clean. Every issue from all prior rounds (claude r1/r2/r3, gemini, codex P2 × 2) is now resolved. There are no outstanding observations. The implementation is faithful to §5.1 with validation that is now fully fail-closed on all three key-integrity dimensions (format, existence, purpose direction). This PR is ready to merge once the base |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 37a0bc1e21
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| dir := t.TempDir() | ||
| path := filepath.Join(dir, encryption.SidecarFilename) | ||
| // Make the dir read-only so OpenFile cannot create the tmp. | ||
| if err := os.Chmod(dir, 0o500); err != nil { |
There was a problem hiding this comment.
Skip permission-based failure test when running as root
This test assumes os.Chmod(dir, 0o500) makes WriteSidecar fail, but that is not true when tests run as UID 0 (or with CAP_DAC_OVERRIDE), where file creation still succeeds; in that environment the assertion at err == nil fails and the package test suite becomes nondeterministic across CI/container setups. Please gate this case (for example by skipping when effectively privileged) or use a failure injection that does not depend on Unix permission enforcement.
Useful? React with 👍 / 👎.
37a0bc1 to
f38303f
Compare
|
@claude review PR722 round-5: codex P2
加えて、PR #719 の round-9 (CodeRabbit Major x3) の base 取り込みのため rebase しています (force-with-lease)。 |
|
Claude finished @bootjp's task in 1m 38s —— View job Code Review — PR #722 Round 5
Round-4 open item resolutionThe one actionable item from round 4 (and its source codex P2) is fully resolved in commit
Fresh pass — no new issues foundSkip logic in Both skips are placed at the top of the test body, before any side-effectful setup (the if runtime.GOOS == "windows" {
t.Skip("dir permission bits don't gate file creation on Windows")
}
if os.Geteuid() == 0 {
t.Skip("UID 0 bypasses dir permission bits; chmod-based failure injection is meaningless")
}
This test's chmod is on the tmp file (
Correctly added at Minor observation (non-blocking): test comment says "round-3" where "round-4" would be more precise The comment at All prior-round fixes remain in place
Reviewer checklist (PR description)
SummaryRound 5 is clean. The codex P2 root-skip fix is correctly implemented — skips are placed before any side-effectful setup, the skip condition uses the standard This PR is ready to merge once the base |
7de3c0d to
6859ba6
Compare
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 6859ba6b29
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| if mode&0o077 != 0 { | ||
| t.Fatalf("sidecar file is group/other readable or writable (mode=%o)", mode) |
There was a problem hiding this comment.
Skip mode-bit assertion on Windows in tmp file mode test
TestWriteSidecar_TmpFileMode says it should skip Windows-style filesystems but never actually does, so the mode&0o077 assertion runs on Windows where Go does not preserve Unix permission bits and commonly reports group/other bits as set. That makes the test fail spuriously on Windows CI even when WriteSidecar is correct, creating platform-dependent red builds from this commit.
Useful? React with 👍 / 👎.
Implements the §5.1 keys.json sidecar layer of the data-at-rest encryption design (docs/design/2026_04_29_proposed_data_at_rest_encryption.md). Stage 1 is library-only; integration with FSM apply (§5.5 reconciliation, §6 admin commands) lands in Stage 5+.
Sidecar struct mirrors §5.1 illustrative JSON exactly:
- Version (uint), bumped on layout change; reader rejects mismatch with ErrSidecarVersion so an older binary cannot silently drop fields it does not know about.
- RaftAppliedIndex / StorageEnvelopeActive / RaftEnvelopeCutoverIndex fields for §5.5 sidecar / Raft-log reconciliation.
- ActiveKeys (Storage / Raft uint32) -- 0 means "not bootstrapped" sentinel per §5.1.
- Keys map[string]SidecarKey keyed by decimal-string form of key_id (per §5.1: "JSON object keys must be strings, but the on-disk envelope and the in-memory keystore always work in the binary uint32 form"). Each entry carries Purpose ("storage"/"raft"), Wrapped ([]byte, JSON base64), Created (ISO-8601 string -- not time.Time so future timezone-format additions do not break older readers), and LocalEpoch (uint16, consumed by §4.1 nonce construction).
WriteSidecar implements the §5.1 crash-durable write protocol:
1. Build new contents + ensure Version is set.
2. Write to <path>.tmp at mode 0o600, then file.Sync().
3. os.Rename(<path>.tmp, <path>).
4. dir.Sync() on the parent so the rename is durable.
Skipping step 2 or 4 is treated as a data-loss-class bug per §10 lens 1: a power loss between the rename and the directory inode flush can roll back keys.json while the rotation Raft entry is already committed, stranding ciphertext under a wrap that has effectively disappeared. Dir-fsync errors propagate -- the design refuses to start on filesystems (NFS, some FUSE) that cannot guarantee durability of the rename.
Validation (applied symmetrically at Read and Write):
- ErrSidecarVersion: wire version != SidecarVersion.
- ErrSidecarPurpose: keys-map entry purpose outside {storage, raft}.
- ErrSidecarKeyIDFormat: keys-map key not a decimal uint32.
- ErrSidecarReservedKeyID: keys-map carries key_id 0 (reserved sentinel).
Tests (sidecar_test.go, 22 cases):
- Round-trip: Write then Read returns the same struct.
- No tmp file left behind on success.
- Overwrite existing.
- File mode 0o600 (no group/other read/write).
- WriteSidecar sets Version automatically.
- Rejects nil sidecar.
- Validation: bad purpose / reserved key_id / non-numeric key_id at Write time.
- ReadSidecar: missing file (IsNotExist), unknown version (multiple), corrupt JSON (4 cases), bad purpose, reserved key_id, non-numeric key_id, key_id overflowing uint32 (4294967296).
- Partial-write cleanup: tmp file is removed if rename fails (target path is a directory).
- Constants pinned (storage / raft purpose strings).
47 tests + 2 property tests + 3 benchmarks total under internal/encryption/, -race clean, 0 lint issues.
Stacked on Stage 0 (PR #719).
P2 (codex line 218 -- security): writeTmpAndFsync used os.OpenFile with O_CREATE|O_TRUNC; if keys.json.tmp pre-existed at e.g. 0o666 (older tooling, manual poking, partial recovery), the existing mode was retained because O_CREATE only applies the mode arg on a fresh inode. os.Rename then moved the permissive mode into the production keys.json, defeating the wrapped-DEK file-mode guarantee. Fix: os.Remove any pre-existing tmp before O_EXCL-creating fresh at sidecarFileMode (0o600). New TestWriteSidecar_StaleTmpDoesNotLeakPermissiveMode pins the regression. Medium (gemini line 141 + claude[bot] #3): validateSidecar did not check that Active.Storage / Active.Raft (when non-zero) actually appear in the Keys map. A sidecar with an active key_id but no corresponding wrapped-DEK entry is malformed input; rotation and bootstrap paths always write the two halves together. Added requireActiveKey check in validateSidecar (applied symmetrically at Read and Write); new ErrSidecarActiveKeyMissing typed error. Tests cover both Storage and Raft directions on Write and the Read path. Medium (gemini line 188): WriteSidecar mutated the callers Sidecar.Version field. Operate on a shallow copy (scCopy := *sc) instead so callers reusing the struct see Version preserved. Updated TestWriteSidecar_OnDiskVersion to assert only the on-disk side; new TestWriteSidecar_DoesNotMutateCaller pins the new contract. Medium (gemini line 231): writeTmpAndFsync silently ignored f.Close errors. Convert to a named-return retErr pattern with a defer that captures Close failures when the main path otherwise succeeded -- write-back failures are now surfaced. Minor (claude[bot] #1): WriteSidecar registered the cleanup defer AFTER calling writeTmpAndFsync, so a write/fsync failure inside that helper would leave keys.json.tmp on disk. Move the defer above the call so any failure path -- including pre-rename ones -- triggers cleanup. New TestWriteSidecar_StaleTmpFileIsCleanedOnWriteFailure exercises the read-only-dir path. Minor (claude[bot] #2): added ErrUnsupportedFilesystem typed error per §5.1. syncDir now wraps the underlying fsync error with this sentinel using pkgerrors.WithSecondaryError so Stage 5+ startup integration can errors.Is-match without parsing strings. Caller audit per the loop prompt: WriteSidecar / ReadSidecar / syncDir all only have callers in sidecar_test.go for Stage 1 (library only); the syncDir name is shared with internal/raftengine/etcd/persistence.go syncDir but they live in different packages and do not collide. Semantic changes (non-mutating WriteSidecar, Close error propagation, ErrUnsupportedFilesystem wrapping) are contained. Tests now cover 26 sidecar cases (was 22). Package total: 51 tests + 2 property tests + 3 benchmarks; -race clean, 0 lint issues.
…722 r2) claude[bot] r2: TestReadSidecar_RejectsActiveKeyMissing only tested the Active.Storage direction; the Active.Raft direction was implicitly covered by the same requireActiveKey code path but had no explicit regression. Convert to a table-driven test with three cases (storage-only, raft-only, both) so a future change to the per-purpose validation is caught.
…codex P2) requireActiveKey previously confirmed only that the active key_id existed in the Keys map, not that the referenced entry's Purpose matched the slot. A sidecar with active.storage pointing to a purpose="raft" entry (or vice versa) would pass both Write and Read validation, then route the wrong DEK into a purpose-specific encryption path after restart or rotation — a fail-open foot-gun on malformed input. Tightened requireActiveKey to also verify the entry's Purpose. New ErrSidecarActivePurposeMismatch sentinel keeps it errors.Is-matchable and distinct from ErrSidecarActiveKeyMissing. Tests cover both WriteSidecar (in-memory pre-write check) and ReadSidecar (post-unmarshal check, simulating a sidecar that landed on disk via some other path).
TestWriteSidecar_StaleTmpFileIsCleanedOnWriteFailure relies on chmod(dir, 0o500) blocking OpenFile inside writeTmpAndFsync, but UID 0 (and any process holding CAP_DAC_OVERRIDE) bypasses the dir's permission bits, so OpenFile would still succeed and the assertion "WriteSidecar succeeded with read-only dir" fires spuriously. CI containers and rebuild environments often run as root, so the suite became nondeterministic across setups. Skip when os.Geteuid() == 0 (and on Windows, where the unix mode model doesn't gate file creation either. The non-root path still exercises the leak-cleanup property; this just narrows where it runs. EOF )
The doc comment on TestWriteSidecar_TmpFileMode said it should skip Windows, but the runtime.GOOS guard was missing — Go on Windows reports synthetic permission bits that frequently include group/ other reads even on a freshly written file, so the mode&0o077 == 0 assertion fires spuriously and the suite goes red on Windows CI even when WriteSidecar is correct. Add the missing runtime.GOOS == "windows" → t.Skip. The unix inheritance property is also covered (with the same self-skip shape) by TestWriteSidecar_StaleTmpDoesNotLeakPermissiveMode, so the cross-platform invariant is not lost.
6859ba6 to
9e5f9bd
Compare
|
@claude review PR722 round-6: codex P2
加えて PR #719 round-12 (codex P2 FileWrapper nil guard) の base 取り込みのため rebase しています (force-with-lease)。 |
|
Claude finished @bootjp's task in 1m 49s —— View job Code Review — PR #722 Round 6
Round-5 follow-up resolutionThe codex P2 from commit
Fresh pass — no new issues found
The fix is minimal and correct: if runtime.GOOS == "windows" {
t.Skip("unix permission bits are not preserved on Windows filesystems")
}
This test operates on the Minor observation (non-blocking): comment attribution The new comment at All prior-round fixes confirmed intact
Reviewer checklist (PR description)
SummaryRound 6 is clean. The Windows mode-bit skip in This PR is ready to merge once the base |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 9e5f9bd010
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| if finalSt.Mode().Perm()&0o077 != 0 { | ||
| t.Fatalf("keys.json inherited permissive mode from stale tmp: mode=%o", |
There was a problem hiding this comment.
Skip Windows mode-bit assertion in stale tmp permission test
TestWriteSidecar_StaleTmpDoesNotLeakPermissiveMode still asserts finalSt.Mode().Perm()&0o077 == 0 without a Windows guard, but on Windows os.Chmod only uses the owner-writable bit and other permission bits are not reliable Unix semantics. This means the test can fail on Windows even when WriteSidecar is correct, reintroducing the same platform-dependent flake you already fixed in TestWriteSidecar_TmpFileMode.
Useful? React with 👍 / 👎.
Round-6 (9e5f9bd) added the runtime.GOOS == "windows" skip to TestWriteSidecar_TmpFileMode but missed the symmetric gap in TestWriteSidecar_StaleTmpDoesNotLeakPermissiveMode. On Windows os.Chmod only honours the owner-writable bit and Go reports synthetic values for the unix group/other bits, so the existing self-skip on the seeded tmp may not fire and the final finalSt.Mode().Perm()&0o077 != 0 assertion can flake even when WriteSidecar is correct — re-introducing the same platform red build the round-6 commit just fixed elsewhere. Add a top-level runtime.GOOS == "windows" guard. The unix branch still covers the security invariant; Windows lacks the matching threat model (no multi-user shared filesystem with unix mode bits as the access boundary).
|
@claude review PR722 round-7: codex P2
|
|
Claude finished @bootjp's task in 2m 0s —— View job Code Review — PR #722 Round 7
Round-6 follow-up resolutionThe codex P2 from commit
Fresh pass — no new issues found
The fix is minimal and correct: if runtime.GOOS == "windows" {
t.Skip("unix mode bits are not preserved by os.Chmod on Windows; threat model does not apply")
}
This test already gained its Windows skip in round 6 (
This test has both root-skip ( Caller audit: other chmod/mode assertions
No other mode-bit assertions remain unguarded. ✓ All prior-round fixes confirmed intact
Reviewer checklist (PR description)
SummaryRound 7 is clean. The Windows mode-bit skip in This PR is ready to merge once the base |

Summary
Stage 1 of the data-at-rest encryption rollout. Stacked on #719 (Stage 0) — base branch is
feat/encryption-foundation. Once Stage 0 merges to main, this PR base will switch tomainand the diff will collapse to just the Stage 1 files.Implements the §5.1 keys.json sidecar layer of
docs/design/2026_04_29_proposed_data_at_rest_encryption.md. Library-only; integration with FSM apply (§5.5 reconciliation) and admin commands (§6) lands in Stage 5+.What is added
internal/encryption/sidecar.go—Sidecarstruct mirroring §5.1 JSON exactly:Version,RaftAppliedIndex,StorageEnvelopeActive,RaftEnvelopeCutoverIndexActiveKeys{Storage, Raft uint32}—0is the §5.1 not-bootstrapped sentinelKeys map[string]SidecarKey— keyed by decimal-string form ofkey_idper §5.1SidecarKey{Purpose, Wrapped, Created, LocalEpoch}—Createdis a string (nottime.Time) so future timezone-format additions do not break old readersWriteSidecar(path, *Sidecar) errorimplements the §5.1 crash-durable protocol:Versionis set automatically) and validate.<path>.tmpat mode0o600, thenfile.Sync().os.Rename(<path>.tmp, <path>).dir.Sync()on the parent so the rename is durable.Skipping any step is a data-loss-class bug per §10 lens 1; a power loss between the rename and the directory inode flush can roll back
keys.jsonwhile the rotation Raft entry is already committed, stranding ciphertext under a wrap that has effectively disappeared. Dir-fsync errors propagate — §5.1 explicitly refuses to start on filesystems (NFS, some FUSE) that cannot guarantee durability of the rename.ReadSidecar(path) (*Sidecar, error)— validates wire version, per-key purpose (storage/raft), and the decimal-uint32 form of every keys-map entry. Returns typed errors (ErrSidecarVersion,ErrSidecarPurpose,ErrSidecarKeyIDFormat,ErrSidecarReservedKeyID).IsNotExist(err)convenience for the first-boot case.Tests (
sidecar_test.go, 22 cases)reflect.DeepEqual..tmpfile left behind on success; tmp cleaned up on rename failure (target as directory).WriteSidecarsetsVersionautomatically.ReadSidecar: missing (IsNotExist), unknown version (4 values), corrupt JSON (4 cases), key_id overflowinguint32(4294967296).storage/raftpurpose strings).Self-review (CLAUDE.md 5 lenses)
.tmp. Validation symmetric at Read and Write so a malformed sidecar cannot land on disk viaWriteSidecar.Versionfield is the wire-format gate. Reader fails closed on unknown version. Future v2 will be additive JSON fields plus a bump.-raceclean, 0 lint issues.Test plan
go test ./internal/encryption/...(~7s)go test -race ./internal/encryption/...(~33s)golangci-lint run ./internal/encryption/...(0 issues)Wrappedas base64).WriteSidecar.