Skip to content

feat(encryption): sidecar crash-durable read/write (Stage 1)#722

Open
bootjp wants to merge 7 commits intomainfrom
feat/encryption-sidecar
Open

feat(encryption): sidecar crash-durable read/write (Stage 1)#722
bootjp wants to merge 7 commits intomainfrom
feat/encryption-sidecar

Conversation

@bootjp
Copy link
Copy Markdown
Owner

@bootjp bootjp commented Apr 30, 2026

Summary

Stage 1 of the data-at-rest encryption rollout. Stacked on #719 (Stage 0) — base branch is feat/encryption-foundation. Once Stage 0 merges to main, this PR base will switch to main and the diff will collapse to just the Stage 1 files.

Implements the §5.1 keys.json sidecar layer of docs/design/2026_04_29_proposed_data_at_rest_encryption.md. Library-only; integration with FSM apply (§5.5 reconciliation) and admin commands (§6) lands in Stage 5+.

What is added

  • internal/encryption/sidecar.goSidecar struct mirroring §5.1 JSON exactly:

    • Version, RaftAppliedIndex, StorageEnvelopeActive, RaftEnvelopeCutoverIndex
    • ActiveKeys{Storage, Raft uint32}0 is the §5.1 not-bootstrapped sentinel
    • Keys map[string]SidecarKey — keyed by decimal-string form of key_id per §5.1
    • SidecarKey{Purpose, Wrapped, Created, LocalEpoch}Created is a string (not time.Time) so future timezone-format additions do not break old readers
  • WriteSidecar(path, *Sidecar) error implements the §5.1 crash-durable protocol:

    1. Build contents (Version is set automatically) and validate.
    2. Write to <path>.tmp at mode 0o600, then file.Sync().
    3. os.Rename(<path>.tmp, <path>).
    4. dir.Sync() on the parent so the rename is durable.

    Skipping any step is a data-loss-class bug per §10 lens 1; a power loss between the rename and the directory inode flush can roll back keys.json while the rotation Raft entry is already committed, stranding ciphertext under a wrap that has effectively disappeared. Dir-fsync errors propagate — §5.1 explicitly refuses to start on filesystems (NFS, some FUSE) that cannot guarantee durability of the rename.

  • ReadSidecar(path) (*Sidecar, error) — validates wire version, per-key purpose (storage / raft), and the decimal-uint32 form of every keys-map entry. Returns typed errors (ErrSidecarVersion, ErrSidecarPurpose, ErrSidecarKeyIDFormat, ErrSidecarReservedKeyID).

  • IsNotExist(err) convenience for the first-boot case.

Tests (sidecar_test.go, 22 cases)

  • Round-trip: Write → Read returns the same struct via reflect.DeepEqual.
  • No .tmp file left behind on success; tmp cleaned up on rename failure (target as directory).
  • Overwrite existing.
  • File mode 0o600 (no group/other read/write).
  • WriteSidecar sets Version automatically.
  • Validation: bad purpose / reserved key_id / non-numeric key_id at both Read and Write.
  • ReadSidecar: missing (IsNotExist), unknown version (4 values), corrupt JSON (4 cases), key_id overflowing uint32 (4294967296).
  • Constants pinned (storage / raft purpose strings).

Self-review (CLAUDE.md 5 lenses)

  1. Data loss — Crash-durable protocol matches §5.1; partial-write cleanup test confirms no orphaned .tmp. Validation symmetric at Read and Write so a malformed sidecar cannot land on disk via WriteSidecar.
  2. Concurrency — Sidecar I/O is not concurrent (one writer per data dir). No locking; future Stage 5+ admin layer will serialize at the FSM apply boundary.
  3. Performance — Single read at boot, single write per rotation/cutover entry. Marshal cost is negligible vs. the fsync syscalls.
  4. Data consistencyVersion field is the wire-format gate. Reader fails closed on unknown version. Future v2 will be additive JSON fields plus a bump.
  5. Test coverage — 22 sidecar cases on top of Stage 0 31 tests + 2 property tests + 3 benchmarks; package total now 47 tests, -race clean, 0 lint issues.

Test plan

  • go test ./internal/encryption/... (~7s)
  • go test -race ./internal/encryption/... (~33s)
  • golangci-lint run ./internal/encryption/... (0 issues)
  • Reviewer: confirm Sidecar JSON matches the §5.1 illustrative shape verbatim (field names, Wrapped as base64).
  • Reviewer: confirm crash-durable protocol order (write → file.Sync → rename → dir.Sync) is followed in WriteSidecar.
  • Reviewer: confirm validation is applied at both Read and Write.

@bootjp
Copy link
Copy Markdown
Owner Author

bootjp commented Apr 30, 2026

@claude review

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 30, 2026

Important

Review skipped

Auto reviews are disabled on base/target branches other than the default branch.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: f363fb16-ccad-4b63-a59a-537292bfa45c

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/encryption-sidecar

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@claude
Copy link
Copy Markdown

claude Bot commented Apr 30, 2026

Claude finished @bootjp's task in 3m 14s —— View job


Code Review — PR #722 feat(encryption): sidecar crash-durable read/write (Stage 1)

  • Gather context and read changed files
  • Review sidecar.go — struct layout, crash-durable write protocol, validation
  • Review sidecar_test.go — coverage, edge cases, correctness
  • Cross-check against §5.1 of design doc
  • Post final review

Reviewer checklist items (from PR description)

  • Sidecar JSON matches §5.1 shape verbatim. Field names and ordering are exact: version, raft_applied_index, storage_envelope_active, raft_envelope_cutover_index, active.{storage,raft}, keys.<decimal_id>.{purpose,wrapped,created,local_epoch}. Wrapped []byte marshals to base64 automatically via encoding/json. ✓
  • Crash-durable protocol order confirmed. writeTmpAndFsyncos.RenamesyncDir. Matches §5.1 steps 2→3→4. ✓
  • Validation applied at both Read and Write. validateSidecar is called in WriteSidecar before marshalling and in ReadSidecar after unmarshalling. ✓

Issues found

1. Stale .tmp file on write-phase failure [Minor]

WriteSidecar's cleanup defer is registered only after writeTmpAndFsync returns successfully (sidecar.go:199). If the tmp file is created but f.Write or f.Sync inside writeTmpAndFsync fails, the .tmp file is left on disk and the defer never fires.

// sidecar.go:193–203 — defer is registered AFTER writeTmpAndFsync returns
if err := writeTmpAndFsync(tmpPath, data); err != nil {
    return err   // ← defer not yet registered; stale .tmp stays on disk
}
defer func() {
    if retErr != nil {
        _ = os.Remove(tmpPath)
    }
}()

Not a correctness bug in practice (O_TRUNC fixes it on the next call), but TestWriteSidecar_PartialWriteCleanup tests only the rename-failure path, so the PR description's statement "partial-write cleanup test confirms no orphaned .tmp" is only partially accurate. The write-failure path (disk full, fsync error on tmp) is untested.

The fix is small — add cleanup inside writeTmpAndFsync itself, or move the defer before calling it. Fix this →

2. ErrUnsupportedFilesystem not defined [Spec gap, Stage 1 acceptable]

docs/design/2026_04_29_proposed_data_at_rest_encryption.md §5.1 states:

On filesystems that lack dir.Sync() semantics (NFS, some FUSE mounts) the encryption package refuses to start with ErrUnsupportedFilesystem.

This error is not yet declared in errors.go. Currently syncDir propagates the raw os.Sync error, which makes it impossible for the caller to distinguish "NFS without dir-fsync" from "directory not found". Fine for this library-only Stage 1 (the process-start integration is Stage 5+), but ErrUnsupportedFilesystem should be defined in errors.go now so Stage 5+ can pattern-match on it without a separate PR. Fix this →

3. No cross-validation of Active IDs against the Keys map [Minor, defensive gap]

validateSidecar checks that each entry in Keys is well-formed, but does not verify that Active.Storage and Active.Raft (when non-zero) actually appear in Keys, nor that the referenced entry's Purpose matches the direction (storage/raft). A sidecar with Active.Storage = 999 but no "999" key passes validation and is written to disk, only failing at the higher-layer unwrap call.

This is an intentional Stage 1 boundary (the validator is format-only; semantic constraints belong to the FSM layer). Noting it here so Stage 5+ explicitly adds the cross-check before writing an active-key promotion.


Strengths

  • Crash-durable protocol is faithfully implemented. The four-step sequence in WriteSidecar (writeTmpAndFsyncos.RenamesyncDir) matches §5.1 exactly. syncDir correctly opens the directory and calls f.Sync() so the rename is durable even under a kernel that batches directory writes.
  • Error wrapping is consistent throughout. All boundary errors use pkgerrors.Wrapf, satisfying the wrapcheck linter. Sentinel errors (ErrSidecarVersion, ErrSidecarPurpose, ErrSidecarKeyIDFormat, ErrSidecarReservedKeyID) are chained correctly so callers can errors.Is-match on the cause while still seeing the full path context.
  • IsNotExist is correct. Unwrapping through the pkgerrors chain to fs.ErrNotExist works because os.ReadFileos.PathErrorfs.ErrNotExist and errors.Is traverses the chain. The first-boot case is handled cleanly.
  • File-mode security is correct. O_WRONLY|O_CREATE|O_TRUNC with 0o600 on both create and overwrite paths prevents a TOCTOU window where an attacker reads wrapped key material from a world-readable temp file.
  • WriteSidecar mutates the caller's Version field — this is documented and tested by TestWriteSidecar_SetsVersion. The behaviour is intentional and unambiguous.
  • Test suite is thorough for the paths it covers. 22 cases across round-trip, mode bits, overwrite, cleanup, constants pinning, unknown version (4 values), corrupt JSON (4 cases), and all three key-id error kinds at both Read and Write. -race clean per the PR.

Summary

The implementation is solid and faithful to §5.1. The crash-durable write order, error types, and JSON layout are all correct. The two actionable issues are:

  1. Stale .tmp on write-phase failure (minor but worth closing before Stage 5+ when write failures become load-bearing)
  2. ErrUnsupportedFilesystem should be defined now (small addition to errors.go, avoids a dependency on a later PR)

Issue 3 (cross-validation of Active IDs) is a known Stage 5+ concern, not a blocker.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces the Sidecar management system for encryption keys, providing functionality to read and write the keys.json file using a crash-durable protocol. The implementation includes validation for key purposes and IDs, along with comprehensive unit tests. The review feedback suggests enhancing the validation logic to ensure active keys are present in the key map, avoiding side effects by not mutating input structs during write operations, and ensuring that errors from file closure are not silently ignored.

Comment on lines +134 to +141
func validateSidecar(sc *Sidecar) error {
for idStr, k := range sc.Keys {
if err := validateSidecarKey(idStr, k); err != nil {
return err
}
}
return nil
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

validateSidecar should verify that the Active keys (Storage and Raft) actually exist in the Keys map if they are non-zero. A sidecar with an active key ID that has no corresponding entry in the Keys map is in an inconsistent state.

func validateSidecar(sc *Sidecar) error {
	for idStr, k := range sc.Keys {
		if err := validateSidecarKey(idStr, k); err != nil {
			return err
		}
	}
	if sc.Active.Storage != ReservedKeyID {
		if _, ok := sc.Keys[strconv.FormatUint(uint64(sc.Active.Storage), 10)]; !ok {
			return pkgerrors.Errorf("encryption: active storage key %d not found in keys map", sc.Active.Storage)
		}
	}
	if sc.Active.Raft != ReservedKeyID {
		if _, ok := sc.Keys[strconv.FormatUint(uint64(sc.Active.Raft), 10)]; !ok {
			return pkgerrors.Errorf("encryption: active raft key %d not found in keys map", sc.Active.Raft)
		}
	}
	return nil
}

Comment thread internal/encryption/sidecar.go Outdated
Comment on lines +180 to +188
sc.Version = SidecarVersion
if err := validateSidecar(sc); err != nil {
return pkgerrors.Wrap(err, "encryption: validate before write")
}

data, err := json.MarshalIndent(sc, "", " ")
if err != nil {
return pkgerrors.Wrap(err, "encryption: marshal sidecar")
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

WriteSidecar mutates the Version field of the input Sidecar struct. This side effect might be unexpected for callers who reuse the struct. It is safer to perform this assignment on a local copy before marshaling.

Suggested change
sc.Version = SidecarVersion
if err := validateSidecar(sc); err != nil {
return pkgerrors.Wrap(err, "encryption: validate before write")
}
data, err := json.MarshalIndent(sc, "", " ")
if err != nil {
return pkgerrors.Wrap(err, "encryption: marshal sidecar")
}
scCopy := *sc
scCopy.Version = SidecarVersion
if err := validateSidecar(&scCopy); err != nil {
return pkgerrors.Wrap(err, "encryption: validate before write")
}
data, err := json.MarshalIndent(&scCopy, "", " ")
if err != nil {
return pkgerrors.Wrap(err, "encryption: marshal sidecar")
}

Comment thread internal/encryption/sidecar.go Outdated
Comment on lines +217 to +231
func writeTmpAndFsync(tmpPath string, data []byte) error {
f, err := os.OpenFile(tmpPath, os.O_WRONLY|os.O_CREATE|os.O_TRUNC, sidecarFileMode)
if err != nil {
return pkgerrors.Wrapf(err, "encryption: open %q", tmpPath)
}
defer func() { _ = f.Close() }()

if _, err := f.Write(data); err != nil {
return pkgerrors.Wrapf(err, "encryption: write %q", tmpPath)
}
if err := f.Sync(); err != nil {
return pkgerrors.Wrapf(err, "encryption: fsync %q", tmpPath)
}
return nil
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The error from f.Close() is silently ignored. According to the general rules, errors from Close() should be handled or logged to ensure resource leaks or cleanup issues are visible. Since this is a write operation, it is particularly important to ensure the file is closed successfully.

Suggested change
func writeTmpAndFsync(tmpPath string, data []byte) error {
f, err := os.OpenFile(tmpPath, os.O_WRONLY|os.O_CREATE|os.O_TRUNC, sidecarFileMode)
if err != nil {
return pkgerrors.Wrapf(err, "encryption: open %q", tmpPath)
}
defer func() { _ = f.Close() }()
if _, err := f.Write(data); err != nil {
return pkgerrors.Wrapf(err, "encryption: write %q", tmpPath)
}
if err := f.Sync(); err != nil {
return pkgerrors.Wrapf(err, "encryption: fsync %q", tmpPath)
}
return nil
}
func writeTmpAndFsync(tmpPath string, data []byte) (err error) {
f, err := os.OpenFile(tmpPath, os.O_WRONLY|os.O_CREATE|os.O_TRUNC, sidecarFileMode)
if err != nil {
return pkgerrors.Wrapf(err, "encryption: open %q", tmpPath)
}
defer func() {
if closeErr := f.Close(); closeErr != nil && err == nil {
err = pkgerrors.Wrapf(closeErr, "encryption: close %q", tmpPath)
}
}()
if _, err := f.Write(data); err != nil {
return pkgerrors.Wrapf(err, "encryption: write %q", tmpPath)
}
if err := f.Sync(); err != nil {
return pkgerrors.Wrapf(err, "encryption: fsync %q", tmpPath)
}
return nil
}
References
  1. Do not silently ignore errors from Close() methods on resources like network connections. Log them to ensure resource leaks or other cleanup problems are visible.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: a1def5d849

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread internal/encryption/sidecar.go Outdated
// Helper exists so WriteSidecar does not nest its own defer-on-error
// inside the rename block.
func writeTmpAndFsync(tmpPath string, data []byte) error {
f, err := os.OpenFile(tmpPath, os.O_WRONLY|os.O_CREATE|os.O_TRUNC, sidecarFileMode)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Force restrictive mode on existing sidecar temp file

WriteSidecar opens <path>.tmp with os.O_CREATE|os.O_TRUNC, but the 0o600 mode is only applied when the file is newly created; if keys.json.tmp already exists (e.g., leftover from older tooling or manually created with broader perms), its permissive mode is retained and then moved into place by os.Rename, making keys.json group/other readable despite the security contract. This is reproducible with a preexisting keys.json.tmp at 0666, which yields a final sidecar mode of 0644 under a typical umask.

Useful? React with 👍 / 👎.

@bootjp
Copy link
Copy Markdown
Owner Author

bootjp commented Apr 30, 2026

@claude review

@chatgpt-codex-connector
Copy link
Copy Markdown

You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard.
To continue using code reviews, you can upgrade your account or add credits to your account and enable them for code reviews in your settings.

@claude
Copy link
Copy Markdown

claude Bot commented Apr 30, 2026

Claude finished @bootjp's task in 2m 28s —— View job


Code Review — PR #722 feat(encryption): sidecar crash-durable read/write (Stage 1) (Round 2)

  • Gather context and read changed files
  • Check what was fixed from round-1 review
  • Review sidecar.go — crash-durable write, validation, error handling
  • Review sidecar_test.go — coverage gaps after fixes
  • Review errors.go — new error types
  • Cross-check against gemini/codex review comments
  • Post final review

Round-1 / Gemini / Codex resolution check

All 6 issues from round 1 are fully resolved in commit 95c04e4.

Issue Reporter Status
Defer registered after writeTmpAndFsync → stale tmp on write-phase failure claude round-1 #1 ✅ Fixed — defer moved above the call; TestWriteSidecar_StaleTmpFileIsCleanedOnWriteFailure added
ErrUnsupportedFilesystem undefined claude round-1 #2 ✅ Fixed — declared in errors.go:50; syncDir wraps raw fsync error with pkgerrors.WithSecondaryError(ErrUnsupportedFilesystem, err)
No cross-validation of Active IDs against Keys map claude round-1 #3 / gemini line 141 ✅ Fixed — requireActiveKey added to validateSidecar; ErrSidecarActiveKeyMissing declared; 3 tests added (Write × 2, Read × 1)
WriteSidecar mutated caller's Version field gemini line 188 ✅ Fixed — scCopy := *sc shallow copy; TestWriteSidecar_DoesNotMutateCaller pins the new contract
f.Close() error silently ignored in writeTmpAndFsync gemini line 231 ✅ Fixed — named-return retErr + defer; close errors propagate when write otherwise succeeded
Pre-existing .tmp mode carried through via os.Rename codex P2 ✅ Fixed — os.Remove then O_EXCL create; TestWriteSidecar_StaleTmpDoesNotLeakPermissiveMode pins the regression

Remaining observations (non-blocking)

1. requireActiveKey does not validate purpose direction [Intentional Stage 5+ deferral]

A sidecar with Active.Storage = 1 where key "1" has Purpose = "raft" passes validateSidecar. The cross-validation added in this round checks existence, not semantic direction. This was noted in round-1 as an intentional boundary: purpose-direction matching belongs to the FSM layer in Stage 5+. No action needed now, but Stage 5+ must add the check before writing an active-key promotion.

2. TestReadSidecar_RejectsActiveKeyMissing covers only the Storage direction [Minor asymmetry]

TestWriteSidecar_RejectsActiveKeyMissing tests both Active.Storage and Active.Raft missing-entry cases on the write path (sidecar_test.go:382–411). The read-path counterpart at sidecar_test.go:413–430 only exercises Active.Storage = 12345 with an empty keys map; the Active.Raft direction is untested on read. The code path (requireActiveKey is called for both) is correct and symmetric — this is just a test coverage gap. A two-line addition to the existing read test would close it. Fix this →

3. syncDir close error is silently ignored [Acceptable]

syncDir defers _ = f.Close() on a read-only directory fd. Since no write-back is in flight, a close failure on a directory fd carries no durability implication. This is idiomatic and correct.


Verification of the crash-durable protocol

WriteSidecar (sidecar.go:200–243) now follows the §5.1 four-step sequence correctly:

1. scCopy = shallow copy; scCopy.Version = SidecarVersion; validateSidecar(&scCopy)
2. defer cleanup(tmpPath) registered   ← BEFORE the following calls
3. writeTmpAndFsync(tmpPath, data)     → O_EXCL create → Write → Sync → Close (error propagated)
4. os.Rename(tmpPath, path)
5. syncDir(dir)                        → open dir → Sync → Close

Step 2 is now registered before step 3 (the round-1 fix), so any failure at the write or fsync inside writeTmpAndFsync triggers cleanup. The os.Remove-then-O_EXCL pattern in writeTmpAndFsync (the codex P2 fix) ensures the inode is always fresh at 0o600. The syncDir error wraps ErrUnsupportedFilesystem with WithSecondaryError so errors.Is-matching works without string parsing at Stage 5+.


Reviewer checklist (PR description)

  • Sidecar JSON matches §5.1 shape verbatim. Field names, JSON keys, and Wrapped []byte → base64 all confirmed. ✓
  • Crash-durable protocol order confirmed. Write → file.Syncos.Renamedir.Sync. ✓
  • Validation applied at both Read and Write. validateSidecar (including requireActiveKey) is called in both paths. ✓

Summary

Round 2 is clean. All round-1, gemini, and codex issues are resolved with appropriate tests. The two remaining observations are a test coverage asymmetry (Read-path Active.Raft direction) and an intentional Stage 5+ deferral. Neither is a blocker. This is ready to merge once the base feat/encryption-foundation branch lands.

@bootjp
Copy link
Copy Markdown
Owner Author

bootjp commented Apr 30, 2026

@claude review

@claude
Copy link
Copy Markdown

claude Bot commented Apr 30, 2026

Claude finished @bootjp's task in 2m 13s —— View job


Code Review — PR #722 feat(encryption): sidecar crash-durable read/write (Stage 1) (Round 3)

  • Gather context
  • Read current state of changed files (sidecar.go, sidecar_test.go, errors.go)
  • Check what changed since round-2 review (commit 0320384)
  • Review sidecar.go, sidecar_test.go, errors.go
  • Post final review

Round-2 follow-up resolution

The one actionable item from round 2 is closed in commit 0320384.

Issue Status
TestReadSidecar_RejectsActiveKeyMissing missing Raft direction on read path ✅ Fixed — 3 sub-cases now: storage active without entry, raft active without entry, both active without entries (sidecar_test.go:413–443)

Fresh pass — no new issues found

The full three-file diff was reviewed. Everything below is a verification of known-good properties; nothing actionable is new.

Crash-durable protocol (WriteSidecar, lines 200–243)

The §5.1 four-step sequence is faithfully preserved:

  1. Shallow copy (scCopy := *sc) so the caller's struct is not mutated — Version is stamped on the copy only.
  2. validateSidecar(&scCopy) runs before marshalling, so a malformed sidecar cannot reach disk.
  3. Defer for cleanup is registered before writeTmpAndFsync (round-1 fix) — any write or fsync failure cleans up the tmp.
  4. writeTmpAndFsyncos.RenamesyncDir in the correct order.

writeTmpAndFsync (lines 262–286)

  • os.Remove(tmpPath) then O_EXCL create (codex P2 fix) ensures the tmp inode is always fresh at 0o600, regardless of any pre-existing permissive file.
  • Named-return retErr propagates f.Close errors that occur after a successful Sync — not silently dropped.
  • Write-phase failures propagate to the caller's defer, which removes the (not-yet-existent) tmp cleanly via _ = os.Remove.

syncDir (lines 296–308)

Wraps the raw fsync error with pkgerrors.WithSecondaryError(ErrUnsupportedFilesystem, err) so Stage 5+ startup can errors.Is-match on ErrUnsupportedFilesystem without string parsing.

validateSidecar / requireActiveKey (lines 136–165)

  • Per-key format validated via validateSidecarKey (decimal uint32, non-zero, known purpose).
  • requireActiveKey checks existence for both Active.Storage and Active.Raft when non-zero.
  • Known Stage 5+ deferral (unchanged): requireActiveKey does not verify that the referenced key's Purpose matches the expected direction (e.g., Active.Storage pointing to a key with Purpose: "raft" passes). This is an intentional boundary — purpose-direction cross-validation belongs to the FSM layer at Stage 5+.

errors.go

All errors referenced in code and tests are defined: ErrUnsupportedFilesystem (line 50) and ErrSidecarActiveKeyMissing (line 57) added in round-1 fixes; all others from Stage 0 are present.

Test coverage summary

Function Sub-cases Directions covered
TestWriteSidecar_RejectsActiveKeyMissing 2 Storage ✓, Raft ✓
TestReadSidecar_RejectsActiveKeyMissing 3 Storage only ✓, Raft only ✓, Both ✓
TestWriteSidecar_StaleTmpDoesNotLeakPermissiveMode 1 Codex P2 regression ✓
TestWriteSidecar_StaleTmpFileIsCleanedOnWriteFailure 1 Pre-creation failure path ✓
TestWriteSidecar_DoesNotMutateCaller 1 Gemini caller-mutation fix ✓

Reviewer checklist (PR description)

  • Sidecar JSON matches §5.1 shape verbatim. Field names, JSON keys, Wrapped []byte → base64. ✓
  • Crash-durable protocol order confirmed. Write → file.Syncos.Renamedir.Sync. ✓
  • Validation applied at both Read and Write. validateSidecar (including requireActiveKey) runs in both paths. ✓

Summary

Round 3 is clean. The read-path Raft-direction test gap from round 2 is closed. All round-1 (claude), gemini, and codex issues remain resolved. The only open item is the intentional Stage 5+ deferral (purpose-direction cross-validation in requireActiveKey), which is not a blocker.

This PR is ready to merge once the base feat/encryption-foundation (#719) lands on main.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 0320384045

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread internal/encryption/sidecar.go Outdated
Comment on lines +160 to +163
if _, ok := sc.Keys[idStr]; !ok {
return pkgerrors.Wrapf(ErrSidecarActiveKeyMissing,
"active.%s=%d not present in keys map", purpose, id)
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Reject active key IDs with mismatched purpose

validateSidecar only checks that active.storage/active.raft IDs exist in Keys, but it never verifies that the referenced entry has the matching purpose. A sidecar where active.storage points to a raft key (or vice versa) currently passes both WriteSidecar and ReadSidecar, even though this is malformed state and can route the wrong DEK to a purpose-specific encryption path after restart/rotation.

Useful? React with 👍 / 👎.

@bootjp bootjp force-pushed the feat/encryption-sidecar branch from 0320384 to 37a0bc1 Compare May 5, 2026 22:57
@bootjp
Copy link
Copy Markdown
Owner Author

bootjp commented May 5, 2026

@claude review

PR722 round-4: codex P2 Reject active key IDs with mismatched purpose を反映しました (commit 37a0bc1)。

  • requireActiveKey で key 存在チェックの後に k.Purpose != purpose も検査。crossed pointer (active.storage が purpose="raft" を指す等) を fail-closed で弾く。
  • 新しい sentinel ErrSidecarActivePurposeMismatch を追加 (既存の ErrSidecarActiveKeyMissing と distinct)。
  • TestSidecar_RejectsActivePurposeMismatch を追加: WriteSidecar (in-memory pre-write) / ReadSidecar (JSON unmarshal 後) の両側から、storage→raft / raft→storage の双方向クロスを検査 (4 サブケース)。
  • caller audit: requireActiveKeyvalidateSidecar のみ呼び、validateSidecar は Read/Write のみ。well-formed sidecar (test fixture を含む) は purpose が常に slot と一致するため false positive なし、純粋な strictening。
  • 注: PR feat(encryption): foundation package (Stage 0) #719 の codex P2 (cipher zero-value 拒否) を取り込むため base ブランチを最新に rebase したので force-with-lease push しています。
  • make test/lint green。

@claude
Copy link
Copy Markdown

claude Bot commented May 5, 2026

Claude finished @bootjp's task in 2m 25s —— View job


Code Review — PR #722 Round 4

  • Gather context and read changed files
  • Verify requireActiveKey purpose-direction check
  • Verify ErrSidecarActivePurposeMismatch sentinel
  • Verify TestSidecar_RejectsActivePurposeMismatch test coverage
  • Check rebase with PR feat(encryption): foundation package (Stage 0) #719 changes (cipher zero-value rejection)
  • Post final review

Round-3 open item resolution

The one remaining observation from round 3 is fully closed in commit 37a0bc1.

Item Status
requireActiveKey did not verify purpose direction — intentionally deferred to Stage 5+ in r3 ✅ Resolved ahead of schedule — purpose check added in this round

Fresh pass — no new issues found

requireActiveKey (sidecar.go:158–173)

The change is minimal and correct:

k, ok := sc.Keys[idStr]
if !ok {
    return pkgerrors.Wrapf(ErrSidecarActiveKeyMissing, ...)
}
if k.Purpose != purpose {
    return pkgerrors.Wrapf(ErrSidecarActivePurposeMismatch,
        "active.%s=%d but keys[%q].purpose=%q", purpose, id, idStr, k.Purpose)
}

Existence check comes first (preserving ErrSidecarActiveKeyMissing for the missing-entry case), purpose check comes second with a distinct sentinel. The caller site also improved: hardcoded "storage"/"raft" strings replaced by SidecarPurposeStorage/SidecarPurposeRaft constants — a minor defensive gain that prevents a future rename of those constants from silently diverging the caller and the check.

ErrSidecarActivePurposeMismatch (errors.go:59–65)

Declared alongside ErrSidecarActiveKeyMissing, consistent style and placement. Doc comment names the concrete foot-gun ("route the wrong DEK into a purpose-specific encryption path"). errors.Is-matchable through the cockroachdb/errors wrap chain used in requireActiveKey. ✓

TestSidecar_RejectsActivePurposeMismatch (sidecar_test.go:445–501)

Four sub-cases across two axes:

Direction Write path Read path
active.storage → raft key
active.raft → storage key

The Read-path sub-tests correctly bypass WriteSidecar by writing raw JSON via os.WriteFile, then asserting ReadSidecar catches the invariant independently. This is the right approach: it verifies that a sidecar landing via an older binary, manual edit, or partial recovery is also rejected at load time. ✓

Fixture consistencyfixtureSidecar() has Active.Storage=305419896Keys["305419896"].Purpose="storage" and Active.Raft=2596069104Keys["2596069104"].Purpose="raft". Both consistent; the round-trip test continues to pass with the stricter validator. ✓

Total test count — 24 top-level functions in sidecar_test.go (up from 22 in the original submission, 23 after r2). -race clean per the PR description.


Reviewer checklist (PR description)

  • Sidecar JSON matches §5.1 shape verbatim.
  • Crash-durable protocol order confirmed. Write → file.Syncos.Renamedir.Sync. ✓
  • Validation applied at both Read and Write. validateSidecar (including requireActiveKey with purpose check) runs in both paths. ✓

Summary

Round 4 is clean. Every issue from all prior rounds (claude r1/r2/r3, gemini, codex P2 × 2) is now resolved. There are no outstanding observations. The implementation is faithful to §5.1 with validation that is now fully fail-closed on all three key-integrity dimensions (format, existence, purpose direction).

This PR is ready to merge once the base feat/encryption-foundation (#719) lands on main.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 37a0bc1e21

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

dir := t.TempDir()
path := filepath.Join(dir, encryption.SidecarFilename)
// Make the dir read-only so OpenFile cannot create the tmp.
if err := os.Chmod(dir, 0o500); err != nil {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Skip permission-based failure test when running as root

This test assumes os.Chmod(dir, 0o500) makes WriteSidecar fail, but that is not true when tests run as UID 0 (or with CAP_DAC_OVERRIDE), where file creation still succeeds; in that environment the assertion at err == nil fails and the package test suite becomes nondeterministic across CI/container setups. Please gate this case (for example by skipping when effectively privileged) or use a failure injection that does not depend on Unix permission enforcement.

Useful? React with 👍 / 👎.

@bootjp bootjp force-pushed the feat/encryption-sidecar branch from 37a0bc1 to f38303f Compare May 5, 2026 23:14
@bootjp
Copy link
Copy Markdown
Owner Author

bootjp commented May 5, 2026

@claude review

PR722 round-5: codex P2 Skip permission-based failure test when running as root を反映しました (commit f38303f)。

  • TestWriteSidecar_StaleTmpFileIsCleanedOnWriteFailurechmod(dir, 0o500) で OpenFile を失敗させて "tmp leak" を検査していたが、UID 0 / CAP_DAC_OVERRIDE は dir mode を bypass するため CI container 等で nondeterministic になっていた。
  • os.Geteuid() == 0 と Windows で t.Skip。non-root 環境では従来通り leak-cleanup property を回す。
  • 同ファイル内の他の chmod 用途 (line 524) は files mode を seed → fresh OpenFile 0o600 が上書きする検査で root でも正しく動作するため変更不要。

加えて、PR #719 の round-9 (CodeRabbit Major x3) の base 取り込みのため rebase しています (force-with-lease)。
make test/lint green。

@claude
Copy link
Copy Markdown

claude Bot commented May 5, 2026

Claude finished @bootjp's task in 1m 38s —— View job


Code Review — PR #722 Round 5

  • Gather context
  • Read current state of changed files (sidecar.go, sidecar_test.go, errors.go)
  • Verify root-skip fix in TestWriteSidecar_StaleTmpFileIsCleanedOnWriteFailure
  • Confirm unaffected chmod at line 524
  • Confirm all prior round issues remain resolved
  • Post final review

Round-4 open item resolution

The one actionable item from round 4 (and its source codex P2) is fully resolved in commit f38303f7.

Item Status
TestWriteSidecar_StaleTmpFileIsCleanedOnWriteFailure nondeterministic under UID 0 / CAP_DAC_OVERRIDE ✅ Fixed — os.Geteuid() == 0 and runtime.GOOS == "windows" skips added before any side-effectful setup

Fresh pass — no new issues found

Skip logic in TestWriteSidecar_StaleTmpFileIsCleanedOnWriteFailure (sidecar_test.go:569–574)

Both skips are placed at the top of the test body, before any side-effectful setup (the os.Chmod(dir, 0o500) call is at line 578, the cleanup registration at line 581). If the test skips, neither the chmod nor the cleanup is registered — correct, since there is nothing to restore.

if runtime.GOOS == "windows" {
    t.Skip("dir permission bits don't gate file creation on Windows")
}
if os.Geteuid() == 0 {
    t.Skip("UID 0 bypasses dir permission bits; chmod-based failure injection is meaningless")
}

os.Geteuid() == 0 is the standard Go idiom for this class of skip (used by the stdlib itself). The comment correctly acknowledges that CAP_DAC_OVERRIDE without UID 0 is also a concern; detecting capabilities portably in Go requires extra syscalls and the in-practice frequency in CI is negligible. The UID-0 guard resolves the stated nondeterminism.

TestWriteSidecar_StaleTmpDoesNotLeakPermissiveMode (sidecar_test.go:511–551) — correctly unchanged

This test's chmod is on the tmp file (os.Chmod(tmpPath, broadMode) at line 524), not on the parent directory. Root can still set file modes; the test's own self-guard at lines 534–536 (t.Skipf if st.Mode().Perm()&0o077 == 0) handles environments that refuse the seed, regardless of privilege level. No root-skip needed here.

runtime import

Correctly added at sidecar_test.go:8. No unused import.

Minor observation (non-blocking): test comment says "round-3" where "round-4" would be more precise

The comment at sidecar_test.go:567 ends with (codex P2 on PR #722 round-3). The skip-permission codex P2 was posted at 2026-05-05T23:01:50Z on commit 37a0bc1e21, which was the subject of round 4's review — not round 3. The code is correct; only the attribution in the comment is off by one review cycle. No action required.

All prior-round fixes remain in place

Fix Still present
Cleanup defer registered before writeTmpAndFsync (r1) sidecar.go:235–239
ErrUnsupportedFilesystem defined and used in syncDir (r1) errors.go:51–58, sidecar.go:312
requireActiveKey checks existence AND purpose direction (r3/r4) sidecar.go:158–173
ErrSidecarActivePurposeMismatch sentinel (r4) errors.go:67–73
TestSidecar_RejectsActivePurposeMismatch (4 sub-cases) (r4) sidecar_test.go:452–502
TestReadSidecar_RejectsActiveKeyMissing covers both directions (r2) sidecar_test.go:414–444
Caller mutation fix — shallow copy before stamping Version (r1) sidecar.go:216–217
O_EXCL + pre-remove prevents permissive-mode leak from stale tmp (codex P2) sidecar.go:273–277

Reviewer checklist (PR description)

  • Sidecar JSON matches §5.1 shape verbatim.
  • Crash-durable protocol order confirmed. Write → file.Syncos.Renamedir.Sync. ✓
  • Validation applied at both Read and Write. validateSidecar (including requireActiveKey with purpose check) runs in both paths. ✓

Summary

Round 5 is clean. The codex P2 root-skip fix is correctly implemented — skips are placed before any side-effectful setup, the skip condition uses the standard os.Geteuid() == 0 idiom, and the parallel chmod-on-file test (StaleTmpDoesNotLeakPermissiveMode) is correctly left unmodified. No new issues found. All prior-round fixes remain intact.

This PR is ready to merge once the base feat/encryption-foundation (#719) lands on main.

@bootjp bootjp force-pushed the feat/encryption-sidecar branch 2 times, most recently from 7de3c0d to 6859ba6 Compare May 5, 2026 23:33
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 6859ba6b29

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +121 to +122
if mode&0o077 != 0 {
t.Fatalf("sidecar file is group/other readable or writable (mode=%o)", mode)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Skip mode-bit assertion on Windows in tmp file mode test

TestWriteSidecar_TmpFileMode says it should skip Windows-style filesystems but never actually does, so the mode&0o077 assertion runs on Windows where Go does not preserve Unix permission bits and commonly reports group/other bits as set. That makes the test fail spuriously on Windows CI even when WriteSidecar is correct, creating platform-dependent red builds from this commit.

Useful? React with 👍 / 👎.

bootjp added 6 commits May 6, 2026 08:40
Implements the §5.1 keys.json sidecar layer of the data-at-rest encryption design (docs/design/2026_04_29_proposed_data_at_rest_encryption.md). Stage 1 is library-only; integration with FSM apply (§5.5 reconciliation, §6 admin commands) lands in Stage 5+.

Sidecar struct mirrors §5.1 illustrative JSON exactly:

- Version (uint), bumped on layout change; reader rejects mismatch with ErrSidecarVersion so an older binary cannot silently drop fields it does not know about.

- RaftAppliedIndex / StorageEnvelopeActive / RaftEnvelopeCutoverIndex fields for §5.5 sidecar / Raft-log reconciliation.

- ActiveKeys (Storage / Raft uint32) -- 0 means "not bootstrapped" sentinel per §5.1.

- Keys map[string]SidecarKey keyed by decimal-string form of key_id (per §5.1: "JSON object keys must be strings, but the on-disk envelope and the in-memory keystore always work in the binary uint32 form"). Each entry carries Purpose ("storage"/"raft"), Wrapped ([]byte, JSON base64), Created (ISO-8601 string -- not time.Time so future timezone-format additions do not break older readers), and LocalEpoch (uint16, consumed by §4.1 nonce construction).

WriteSidecar implements the §5.1 crash-durable write protocol:

  1. Build new contents + ensure Version is set.

  2. Write to <path>.tmp at mode 0o600, then file.Sync().

  3. os.Rename(<path>.tmp, <path>).

  4. dir.Sync() on the parent so the rename is durable.

Skipping step 2 or 4 is treated as a data-loss-class bug per §10 lens 1: a power loss between the rename and the directory inode flush can roll back keys.json while the rotation Raft entry is already committed, stranding ciphertext under a wrap that has effectively disappeared. Dir-fsync errors propagate -- the design refuses to start on filesystems (NFS, some FUSE) that cannot guarantee durability of the rename.

Validation (applied symmetrically at Read and Write):

- ErrSidecarVersion: wire version != SidecarVersion.

- ErrSidecarPurpose: keys-map entry purpose outside {storage, raft}.

- ErrSidecarKeyIDFormat: keys-map key not a decimal uint32.

- ErrSidecarReservedKeyID: keys-map carries key_id 0 (reserved sentinel).

Tests (sidecar_test.go, 22 cases):

- Round-trip: Write then Read returns the same struct.

- No tmp file left behind on success.

- Overwrite existing.

- File mode 0o600 (no group/other read/write).

- WriteSidecar sets Version automatically.

- Rejects nil sidecar.

- Validation: bad purpose / reserved key_id / non-numeric key_id at Write time.

- ReadSidecar: missing file (IsNotExist), unknown version (multiple), corrupt JSON (4 cases), bad purpose, reserved key_id, non-numeric key_id, key_id overflowing uint32 (4294967296).

- Partial-write cleanup: tmp file is removed if rename fails (target path is a directory).

- Constants pinned (storage / raft purpose strings).

47 tests + 2 property tests + 3 benchmarks total under internal/encryption/, -race clean, 0 lint issues.

Stacked on Stage 0 (PR #719).
P2 (codex line 218 -- security): writeTmpAndFsync used os.OpenFile with O_CREATE|O_TRUNC; if keys.json.tmp pre-existed at e.g. 0o666 (older tooling, manual poking, partial recovery), the existing mode was retained because O_CREATE only applies the mode arg on a fresh inode. os.Rename then moved the permissive mode into the production keys.json, defeating the wrapped-DEK file-mode guarantee. Fix: os.Remove any pre-existing tmp before O_EXCL-creating fresh at sidecarFileMode (0o600). New TestWriteSidecar_StaleTmpDoesNotLeakPermissiveMode pins the regression.

Medium (gemini line 141 + claude[bot] #3): validateSidecar did not check that Active.Storage / Active.Raft (when non-zero) actually appear in the Keys map. A sidecar with an active key_id but no corresponding wrapped-DEK entry is malformed input; rotation and bootstrap paths always write the two halves together. Added requireActiveKey check in validateSidecar (applied symmetrically at Read and Write); new ErrSidecarActiveKeyMissing typed error. Tests cover both Storage and Raft directions on Write and the Read path.

Medium (gemini line 188): WriteSidecar mutated the callers Sidecar.Version field. Operate on a shallow copy (scCopy := *sc) instead so callers reusing the struct see Version preserved. Updated TestWriteSidecar_OnDiskVersion to assert only the on-disk side; new TestWriteSidecar_DoesNotMutateCaller pins the new contract.

Medium (gemini line 231): writeTmpAndFsync silently ignored f.Close errors. Convert to a named-return retErr pattern with a defer that captures Close failures when the main path otherwise succeeded -- write-back failures are now surfaced.

Minor (claude[bot] #1): WriteSidecar registered the cleanup defer AFTER calling writeTmpAndFsync, so a write/fsync failure inside that helper would leave keys.json.tmp on disk. Move the defer above the call so any failure path -- including pre-rename ones -- triggers cleanup. New TestWriteSidecar_StaleTmpFileIsCleanedOnWriteFailure exercises the read-only-dir path.

Minor (claude[bot] #2): added ErrUnsupportedFilesystem typed error per §5.1. syncDir now wraps the underlying fsync error with this sentinel using pkgerrors.WithSecondaryError so Stage 5+ startup integration can errors.Is-match without parsing strings.

Caller audit per the loop prompt: WriteSidecar / ReadSidecar / syncDir all only have callers in sidecar_test.go for Stage 1 (library only); the syncDir name is shared with internal/raftengine/etcd/persistence.go syncDir but they live in different packages and do not collide. Semantic changes (non-mutating WriteSidecar, Close error propagation, ErrUnsupportedFilesystem wrapping) are contained.

Tests now cover 26 sidecar cases (was 22). Package total: 51 tests + 2 property tests + 3 benchmarks; -race clean, 0 lint issues.
…722 r2)

claude[bot] r2: TestReadSidecar_RejectsActiveKeyMissing only tested the Active.Storage direction; the Active.Raft direction was implicitly covered by the same requireActiveKey code path but had no explicit regression. Convert to a table-driven test with three cases (storage-only, raft-only, both) so a future change to the per-purpose validation is caught.
…codex P2)

requireActiveKey previously confirmed only that the active key_id
existed in the Keys map, not that the referenced entry's Purpose
matched the slot. A sidecar with active.storage pointing to a
purpose="raft" entry (or vice versa) would pass both Write and Read
validation, then route the wrong DEK into a purpose-specific
encryption path after restart or rotation — a fail-open foot-gun on
malformed input.

Tightened requireActiveKey to also verify the entry's Purpose. New
ErrSidecarActivePurposeMismatch sentinel keeps it errors.Is-matchable
and distinct from ErrSidecarActiveKeyMissing. Tests cover both
WriteSidecar (in-memory pre-write check) and ReadSidecar
(post-unmarshal check, simulating a sidecar that landed on disk via
some other path).
TestWriteSidecar_StaleTmpFileIsCleanedOnWriteFailure relies on
chmod(dir, 0o500) blocking OpenFile inside writeTmpAndFsync, but UID
0 (and any process holding CAP_DAC_OVERRIDE) bypasses the dir's
permission bits, so OpenFile would still succeed and the assertion
"WriteSidecar succeeded with read-only dir" fires spuriously. CI
containers and rebuild environments often run as root, so the suite
became nondeterministic across setups.

Skip when os.Geteuid() == 0 (and on Windows, where the unix mode
model doesn't gate file creation either. The non-root path still
exercises the leak-cleanup property; this just narrows where it
runs.
EOF
)
The doc comment on TestWriteSidecar_TmpFileMode said it should skip
Windows, but the runtime.GOOS guard was missing — Go on Windows
reports synthetic permission bits that frequently include group/
other reads even on a freshly written file, so the mode&0o077 == 0
assertion fires spuriously and the suite goes red on Windows CI
even when WriteSidecar is correct.

Add the missing runtime.GOOS == "windows" → t.Skip. The unix
inheritance property is also covered (with the same self-skip
shape) by TestWriteSidecar_StaleTmpDoesNotLeakPermissiveMode, so
the cross-platform invariant is not lost.
@bootjp bootjp force-pushed the feat/encryption-sidecar branch from 6859ba6 to 9e5f9bd Compare May 5, 2026 23:42
@bootjp
Copy link
Copy Markdown
Owner Author

bootjp commented May 5, 2026

@claude review

PR722 round-6: codex P2 Skip mode-bit assertion on Windows in tmp file mode test を反映しました (commit 9e5f9bd)。

  • TestWriteSidecar_TmpFileMode の doc は Windows skip を主張していたが runtime.GOOS == "windows" ガードが欠落、Go の Windows file mode 実装は group/other bit を含む合成値を返すため mode&0o077 == 0 が spurious red になる。
  • 明示的に t.Skip を追加、コメントを実装と一致させた。
  • 同じ inheritance property は TestWriteSidecar_StaleTmpDoesNotLeakPermissiveMode でも検査されており、こちらは line 542 の self-skip (permissive seed 不能 → skip) で Windows safe。
  • caller audit: 他の mode 検査箇所は self-skip 済み or unix-only fixture、追加の skip 不要。
  • GOOS=windows go vet clean。

加えて PR #719 round-12 (codex P2 FileWrapper nil guard) の base 取り込みのため rebase しています (force-with-lease)。

@claude
Copy link
Copy Markdown

claude Bot commented May 5, 2026

Claude finished @bootjp's task in 1m 49s —— View job


Code Review — PR #722 Round 6

  • Gather context
  • Read current state of sidecar_test.go (commit 9e5f9bd0)
  • Verify Windows skip in TestWriteSidecar_TmpFileMode
  • Confirm all prior-round fixes remain intact
  • Post final review

Round-5 follow-up resolution

The codex P2 from commit 6859ba6b29 (surfaced between rounds 5 and 6) is fully resolved in commit 9e5f9bd0.

Item Status
TestWriteSidecar_TmpFileMode ran mode&0o077 assertion on Windows where Go reports synthetic permission bits, causing spurious failures ✅ Fixed — runtime.GOOS == "windows" skip added at line 109

Fresh pass — no new issues found

TestWriteSidecar_TmpFileMode (sidecar_test.go:101–132)

The fix is minimal and correct:

if runtime.GOOS == "windows" {
    t.Skip("unix permission bits are not preserved on Windows filesystems")
}
  • Skip is placed at the top of the test body, before all side-effectful code (WriteSidecar is the first call, at line 118). A Windows execution skips cleanly with no partial state left behind. ✓
  • runtime import is present in the import block at line 8. ✓
  • The comment at lines 102–108 accurately describes the gap: the doc comment had claimed a Windows skip existed but the guard was missing; Go on Windows returns synthetic permission bits that include group/other bits on freshly-created files, so mode&0o077 == 0 would fail spuriously. ✓

TestWriteSidecar_StaleTmpDoesNotLeakPermissiveMode (sidecar_test.go:519–559) — correctly unchanged

This test operates on the .tmp file (not the parent directory), and its self-skip at line 542–544 — if st.Mode().Perm()&0o077 == 0 { t.Skipf(...) } — fires on Windows because os.Chmod does not set group/other bits on Windows and the seed fails. The test has been Windows-safe since its introduction; no additional guard needed. ✓

Minor observation (non-blocking): comment attribution

The new comment at sidecar_test.go:107 ends with codex P2 on PR #722 round-3. The Windows mode-bit codex P2 was posted at 2026-05-05T23:38:28Z on commit 6859ba6b29 — after round 5's review. The attribution is off by two rounds. This matches the same pattern Round 5 noted for StaleTmpFileIsCleanedOnWriteFailure (also said "round-3"). Code is correct; only the inline attribution is off. No action required.


All prior-round fixes confirmed intact

Fix Location Status
Cleanup defer registered before writeTmpAndFsync (r1) sidecar.go:235–239
ErrUnsupportedFilesystem defined + used in syncDir (r1) errors.go:58, sidecar.go:312
Caller not mutated — shallow copy before stamping Version (r1) sidecar.go:216–217
f.Close error propagated via named return in writeTmpAndFsync (r1) sidecar.go:281–285
os.Remove + O_EXCL prevents permissive-mode leak from stale tmp (codex P2 r1) sidecar.go:273–277
TestReadSidecar_RejectsActiveKeyMissing covers both Raft and Storage directions (r2) sidecar_test.go:422–452
requireActiveKey checks existence AND purpose direction (r3/r4) sidecar.go:158–173
ErrSidecarActivePurposeMismatch sentinel (r4) errors.go:73
TestSidecar_RejectsActivePurposeMismatch (4 sub-cases) (r4) sidecar_test.go:460–510
Root skip in TestWriteSidecar_StaleTmpFileIsCleanedOnWriteFailure (r5) sidecar_test.go:580–582
Windows skip in TestWriteSidecar_TmpFileMode (r6) sidecar_test.go:109–111

Reviewer checklist (PR description)

  • Sidecar JSON matches §5.1 shape verbatim.
  • Crash-durable protocol order confirmed. Write → file.Syncos.Renamedir.Sync. ✓
  • Validation applied at both Read and Write. validateSidecar (including requireActiveKey with purpose check) runs in both paths. ✓

Summary

Round 6 is clean. The Windows mode-bit skip in TestWriteSidecar_TmpFileMode is correctly implemented — placed before all side-effectful code, using the standard runtime.GOOS == "windows" idiom, with an accurate description of why the assertion fails on Windows. The parallel test (StaleTmpDoesNotLeakPermissiveMode) continues to be self-guarding on Windows via its existing seed-failure skip. All prior-round fixes remain intact.

This PR is ready to merge once the base feat/encryption-foundation (#719) lands on main.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 9e5f9bd010

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +555 to +556
if finalSt.Mode().Perm()&0o077 != 0 {
t.Fatalf("keys.json inherited permissive mode from stale tmp: mode=%o",
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Skip Windows mode-bit assertion in stale tmp permission test

TestWriteSidecar_StaleTmpDoesNotLeakPermissiveMode still asserts finalSt.Mode().Perm()&0o077 == 0 without a Windows guard, but on Windows os.Chmod only uses the owner-writable bit and other permission bits are not reliable Unix semantics. This means the test can fail on Windows even when WriteSidecar is correct, reintroducing the same platform-dependent flake you already fixed in TestWriteSidecar_TmpFileMode.

Useful? React with 👍 / 👎.

Round-6 (9e5f9bd) added the runtime.GOOS == "windows" skip to
TestWriteSidecar_TmpFileMode but missed the symmetric gap in
TestWriteSidecar_StaleTmpDoesNotLeakPermissiveMode. On Windows
os.Chmod only honours the owner-writable bit and Go reports
synthetic values for the unix group/other bits, so the existing
self-skip on the seeded tmp may not fire and the final
finalSt.Mode().Perm()&0o077 != 0 assertion can flake even when
WriteSidecar is correct — re-introducing the same platform red
build the round-6 commit just fixed elsewhere.

Add a top-level runtime.GOOS == "windows" guard. The unix branch
still covers the security invariant; Windows lacks the matching
threat model (no multi-user shared filesystem with unix mode bits
as the access boundary).
@bootjp
Copy link
Copy Markdown
Owner Author

bootjp commented May 5, 2026

@claude review

PR722 round-7: codex P2 Skip Windows mode-bit assertion in stale tmp permission test を反映しました (commit 80ae62b)。

  • round-6 で TmpFileMode だけに skip を追加していたが、StaleTmpDoesNotLeakPermissiveMode に同じ gap が残っていた (chmod の Windows 動作 + synthetic perm 値)。
  • top-level runtime.GOOS == "windows" skip を追加。unix path は security invariant を保持。
  • 他の chmod/mode 検査は audit 済み: StaleTmpFileIsCleanedOnWriteFailure は別途 root/Windows skip 済み (round-5)、writeKEK 経由の KEK テストは fixture を 0o600 で書く側で windows skip 済み。
  • GOOS=windows go vet clean、make test/lint green。

@claude
Copy link
Copy Markdown

claude Bot commented May 5, 2026

Claude finished @bootjp's task in 2m 0s —— View job


Code Review — PR #722 Round 7

  • Gather context
  • Read current state of sidecar_test.go (commit 80ae62b6)
  • Verify Windows skip in TestWriteSidecar_StaleTmpDoesNotLeakPermissiveMode
  • Audit other chmod/mode test paths
  • Confirm all prior-round fixes remain intact
  • Post final review

Round-6 follow-up resolution

The codex P2 from commit 9e5f9bd010 (surfaced between rounds 6 and 7) is fully resolved in commit 80ae62b6.

Item Status
TestWriteSidecar_StaleTmpDoesNotLeakPermissiveMode asserted mode&0o077 == 0 on Windows where Go reports synthetic permission bits, causing spurious failures ✅ Fixed — runtime.GOOS == "windows" skip added at sidecar_test.go:528–530

Fresh pass — no new issues found

TestWriteSidecar_StaleTmpDoesNotLeakPermissiveMode (sidecar_test.go:527–570)

The fix is minimal and correct:

if runtime.GOOS == "windows" {
    t.Skip("unix mode bits are not preserved by os.Chmod on Windows; threat model does not apply")
}
  • Skip is placed at the top of the test body, before all side-effectful code. The first write (os.WriteFile(tmpPath, ...)) is at line 539; the skip at lines 528–530 fires first. ✓
  • The updated doc comment at lines 519–526 accurately describes the gap: os.Chmod on Windows only honours the owner-writable bit; group/other permission bits are reported as synthetic values, so both the seed step and the final mode&0o077 assertion can produce results that don't reflect the property under test. ✓
  • The self-skip at lines 553–555 (if st.Mode().Perm()&0o077 == 0 { t.Skipf(...) }) is preserved and continues to guard non-Windows environments where the permissive seed is refused (strict security policies, unusual FUSE mounts). The two skips are complementary, not redundant. ✓

TestWriteSidecar_TmpFileMode (sidecar_test.go:101–132) — correctly unchanged

This test already gained its Windows skip in round 6 (sidecar_test.go:109–111). The doc comment at line 108 cross-references it as the canonical coverage point for this security property on Unix. No change needed. ✓

TestWriteSidecar_StaleTmpFileIsCleanedOnWriteFailure (sidecar_test.go:587–610) — correctly unchanged

This test has both root-skip (os.Geteuid() == 0) and Windows-skip (runtime.GOOS == "windows") from round 5. It operates on the parent directory permission bits, not the file's mode bits, so the round-7 fix is not applicable here. ✓

Caller audit: other chmod/mode assertions

Test Mode operation Already guarded
TestWriteSidecar_TmpFileMode stat(keys.json).Mode()&0o077 ✅ Windows skip (r6)
TestWriteSidecar_StaleTmpDoesNotLeakPermissiveMode chmod tmp 0o666, stat(keys.json).Mode()&0o077 ✅ Windows skip (r7)
TestWriteSidecar_StaleTmpFileIsCleanedOnWriteFailure chmod dir 0o500 ✅ Root + Windows skip (r5)

No other mode-bit assertions remain unguarded. ✓


All prior-round fixes confirmed intact

Fix Location Status
Cleanup defer registered before writeTmpAndFsync (r1) sidecar.go:235–239
ErrUnsupportedFilesystem defined + used in syncDir (r1) errors.go:58, sidecar.go:312
Caller not mutated — shallow copy before stamping Version (r1) sidecar.go:216–217
f.Close error propagated via named return in writeTmpAndFsync (r1) sidecar.go:281–285
os.Remove + O_EXCL prevents permissive-mode leak from stale tmp (codex P2 r1) sidecar.go:273–277
TestReadSidecar_RejectsActiveKeyMissing covers both Raft and Storage directions (r2) sidecar_test.go:422–452
requireActiveKey checks existence AND purpose direction (r3/r4) sidecar.go:158–173
ErrSidecarActivePurposeMismatch sentinel (r4) errors.go:73
TestSidecar_RejectsActivePurposeMismatch (4 sub-cases) (r4) sidecar_test.go:460–510
Root + Windows skip in TestWriteSidecar_StaleTmpFileIsCleanedOnWriteFailure (r5) sidecar_test.go:588–593
Windows skip in TestWriteSidecar_TmpFileMode (r6) sidecar_test.go:109–111
Windows skip in TestWriteSidecar_StaleTmpDoesNotLeakPermissiveMode (r7) sidecar_test.go:528–530

Reviewer checklist (PR description)

  • Sidecar JSON matches §5.1 shape verbatim.
  • Crash-durable protocol order confirmed. Write → file.Syncos.Renamedir.Sync. ✓
  • Validation applied at both Read and Write. validateSidecar (including requireActiveKey with purpose check) runs in both paths. ✓

Summary

Round 7 is clean. The Windows mode-bit skip in TestWriteSidecar_StaleTmpDoesNotLeakPermissiveMode is correctly implemented — placed before all side-effectful code, using the standard runtime.GOOS == "windows" idiom, with an accurate doc comment explaining the Windows-specific behavior. The self-skip for non-Windows environments that refuse the permissive seed is preserved unchanged. The chmod/mode assertion audit is complete — no unguarded paths remain across all three tests. All prior-round fixes remain intact.

This PR is ready to merge once the base feat/encryption-foundation (#719) lands on main.

Base automatically changed from feat/encryption-foundation to main May 7, 2026 08:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant