Skip to content

fix(checkpoint): redact secrets in committed and temporary metadata#333

Open
andreidavid wants to merge 4 commits intoentireio:mainfrom
andreidavid:feat/strict-redaction-tests
Open

fix(checkpoint): redact secrets in committed and temporary metadata#333
andreidavid wants to merge 4 commits intoentireio:mainfrom
andreidavid:feat/strict-redaction-tests

Conversation

@andreidavid
Copy link

Summary

Hardened checkpoint metadata persistence by redacting secrets before storing checkpoint artifacts.

Why

Prevents accidental secret leakage in checkpoint artifacts persisted on metadata shadow/commit branches.

Changes

  • Redact committed session metadata JSON before writing metadata.json.
  • Redact committed summary updates before persistence.
  • Redact task and subagent transcripts before temporary task checkpoint writes.
  • Redact metadata directory files during temporary checkpoint writes.
  • Added regression tests for committed metadata and temporary metadata redaction paths.

Validation

  • mise run lint
  • go test ./cmd/entire/cli/checkpoint
  • go test ./cmd/entire/cli/strategy -run 'TestPostCommit|TestShadowStrategy_CondenseSession_EphemeralBranchTrailer'

Notes

  • Security-focused hardening only; no runtime behavior outside checkpoint persistence was changed.

@andreidavid andreidavid requested a review from a team as a code owner February 14, 2026 01:54
Copilot AI review requested due to automatic review settings February 14, 2026 01:54
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR hardens the Entire CLI checkpoint persistence layer by redacting secrets before writing checkpoint artifacts to both committed metadata branches and temporary shadow-branch checkpoints, reducing the risk of leaking secrets into git history.

Changes:

  • Redacts committed session metadata JSON (metadata.json) and summary updates before persisting blobs.
  • Redacts task and subagent transcripts during temporary task checkpoint writes, with JSONL-aware redaction and plain-text fallback.
  • Redacts files copied from the metadata directory during temporary checkpoint writes and adds regression tests for the new redaction paths.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
cmd/entire/cli/strategy/phase_postcommit_test.go Deduplicates repeated transcript test data via a shared constant.
cmd/entire/cli/strategy/manual_commit_test.go Reuses shared transcript test data constant to reduce duplication.
cmd/entire/cli/checkpoint/temporary.go Adds redaction for temporary task transcripts and metadata-dir files written into shadow-branch commits.
cmd/entire/cli/checkpoint/committed.go Redacts committed session metadata JSON and summary updates before blob persistence.
cmd/entire/cli/checkpoint/checkpoint_test.go Adds regression tests covering committed metadata and temporary checkpoint redaction behavior.

Comment on lines 906 to 910
return fmt.Errorf("failed to get relative path for %s: %w", path, err)
}

blobHash, mode, err := createBlobFromFile(repo, path)
treePath := filepath.Join(dirPathRel, relWithinDir)
blobHash, mode, err := createRedactedBlobFromFile(repo, path, treePath)
Copy link

Copilot AI Feb 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

addDirectoryToEntriesWithAbsPath will follow symlinks when it calls createRedactedBlobFromFile (which uses os.Stat/os.ReadFile). A malicious or accidental symlink inside the metadata dir could cause temporary checkpoints to read and persist files outside the metadata directory (e.g. ~/.ssh/*). Please skip symlinks (and ideally enforce that the walked path stays within dirPathAbs), similar to the protections used when copying committed metadata.

Copilot uses AI. Check for mistakes.
Comment on lines +3073 to +3081
if err := os.WriteFile(fullPath, []byte(`{"content":"secret=`+highEntropySecret+`"}`+"\n"), 0644); err != nil {
t.Fatalf("failed to write full.jsonl: %v", err)
}

notePath := filepath.Join(metadataDir, "notes.txt")
if err := os.WriteFile(notePath, []byte("secret="+highEntropySecret), 0644); err != nil {
t.Fatalf("failed to write notes.txt: %v", err)
}

Copy link

Copilot AI Feb 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Temporary checkpoint metadata copying/redaction is security-sensitive (symlinks inside the metadata dir can point outside the repo). There is already a committed-path regression test for skipping symlinks (TestCopyMetadataDir_SkipsSymlinks), but this new temporary-path test suite doesn’t cover that case. Adding a similar symlink test for WriteTemporary (or addDirectoryToEntriesWithAbsPath) would help prevent regressions once symlink handling is added.

Suggested change
if err := os.WriteFile(fullPath, []byte(`{"content":"secret=`+highEntropySecret+`"}`+"\n"), 0644); err != nil {
t.Fatalf("failed to write full.jsonl: %v", err)
}
notePath := filepath.Join(metadataDir, "notes.txt")
if err := os.WriteFile(notePath, []byte("secret="+highEntropySecret), 0644); err != nil {
t.Fatalf("failed to write notes.txt: %v", err)
}
if err := os.WriteFile(fullPath, []byte(`{"content":"secret=`+highEntropySecret+`"}`+"\n"), 0o644); err != nil {
t.Fatalf("failed to write full.jsonl: %v", err)
}
notePath := filepath.Join(metadataDir, "notes.txt")
if err := os.WriteFile(notePath, []byte("secret="+highEntropySecret), 0o644); err != nil {
t.Fatalf("failed to write notes.txt: %v", err)
}
// Regression fixture: symlink inside metadata dir pointing outside the repo/metadata tree.
// This mirrors the committed-path symlink test (TestCopyMetadataDir_SkipsSymlinks) for the
// temporary-path flow, ensuring that temporary checkpoint metadata handling is exercised
// in the presence of symlinks that could otherwise escape the repository.
outsideTarget := filepath.Join(tempDir, "outside-metadata-target.txt")
if err := os.WriteFile(outsideTarget, []byte("outside metadata"), 0o644); err != nil {
t.Fatalf("failed to write outside-metadata-target.txt: %v", err)
}
symlinkPath := filepath.Join(metadataDir, "symlink-outside")
if err := os.Symlink(outsideTarget, symlinkPath); err != nil {
t.Fatalf("failed to create symlink in metadata dir: %v", err)
}

Copilot uses AI. Check for mistakes.
- redact committed session metadata JSON before persistence (WriteCommitted session payloads)
- redact summary updates before rewriting committed metadata
- redact temporary task transcripts and subagent transcripts before checkpoint write
- redact metadata directory files when writing temporary checkpoints
- add regression tests for committed summary and temporary metadata redaction paths
@andreidavid andreidavid force-pushed the feat/strict-redaction-tests branch from 07a772d to f09af6a Compare February 14, 2026 02:11
@andreidavid
Copy link
Author

andreidavid commented Feb 14, 2026

Implemented all Copilot review feedback and pushed commit f09af6a.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 5 out of 5 changed files in this pull request and generated 2 comments.

Comment on lines 354 to 360
metadataJSON, err := jsonutil.MarshalIndentWithNewline(sessionMetadata, "", " ")
if err != nil {
return filePaths, fmt.Errorf("failed to marshal session metadata: %w", err)
}
metadataJSON = redact.Bytes(metadataJSON)
metadataHash, err := CreateBlobFromContent(s.repo, metadataJSON)
if err != nil {
Copy link

Copilot AI Feb 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Redacting the entire marshaled metadata.json with redact.Bytes can unintentionally redact high-entropy identifier fields (e.g., session_id, tool_use_id, checkpoint_uuid) because redact.Bytes is context-free and doesn’t apply the JSONL “skip *id fields” logic used elsewhere. That can make persisted metadata inconsistent or harder to correlate back to sessions/tasks. Consider redacting only the free-form fields (e.g., Summary and other user/LLM-generated text) before marshaling, or introduce a JSON-aware redaction helper that traverses the object and skips keys ending in id/ids (similar to redact.JSONLContent).

Copilot uses AI. Check for mistakes.
Comment on lines 976 to 981
metadataJSON, err := jsonutil.MarshalIndentWithNewline(existingMetadata, "", " ")
if err != nil {
return fmt.Errorf("failed to marshal metadata: %w", err)
}
metadataJSON = redact.Bytes(metadataJSON)
metadataHash, err := CreateBlobFromContent(s.repo, metadataJSON)
Copy link

Copilot AI Feb 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same concern as the initial write path: applying redact.Bytes to the full marshaled session metadata.json during UpdateSummary may over-redact identifier fields (not just the updated summary) because it doesn’t skip *id keys. Prefer redacting only the new summary value (or using a JSON-aware redaction that skips IDs) so the update doesn’t mutate unrelated metadata fields.

Copilot uses AI. Check for mistakes.
@andreidavid
Copy link
Author

Implemented all Copilot review feedback and pushed commit 4b8748f.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 7 out of 7 changed files in this pull request and generated 3 comments.

if err != nil {
return fmt.Errorf("failed to get relative path for %s: %w", path, err)
}
if strings.HasPrefix(relWithinDir, "..") {
Copy link

Copilot AI Feb 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The path traversal check strings.HasPrefix(relWithinDir, "..") will also reject legitimate files within the directory whose relative path begins with .. (e.g. a file named ..notes.txt). This can cause temporary checkpoint writes to fail even when there is no traversal.

Prefer a stricter check like rel == ".." or strings.HasPrefix(rel, ".."+string(filepath.Separator)) (and/or using filepath.Clean) to only reject actual parent-directory escapes.

Suggested change
if strings.HasPrefix(relWithinDir, "..") {
relClean := filepath.Clean(relWithinDir)
if relClean == ".." || strings.HasPrefix(relClean, ".."+string(filepath.Separator)) {

Copilot uses AI. Check for mistakes.
redact/redact.go Outdated
Comment on lines 195 to 197
repls := collectJSONLReplacements(parsed)
if len(repls) == 0 {
return content, nil
Copy link

Copilot AI Feb 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

JSONContent now reuses collectJSONLReplacements / shouldSkipJSONLField, but those names are JSONL-specific and become misleading when the same logic is used for non-JSONL content. Consider renaming these helpers (or adding JSON-generic wrappers) to reflect their broader use and reduce future confusion.

Copilot uses AI. Check for mistakes.
redact/redact.go Outdated
Comment on lines 187 to 212
// JSONContent parses a JSON string and redacts string values while skipping
// identifier fields (keys ending in "id") and non-user payload fields.
func JSONContent(content string) (string, error) {
var parsed any
if err := json.Unmarshal([]byte(content), &parsed); err != nil {
return "", err
}

repls := collectJSONLReplacements(parsed)
if len(repls) == 0 {
return content, nil
}

result := content
for _, r := range repls {
origJSON, err := jsonEncodeString(r[0])
if err != nil {
return "", err
}
replJSON, err := jsonEncodeString(r[1])
if err != nil {
return "", err
}
result = strings.ReplaceAll(result, origJSON, replJSON)
}
return result, nil
Copy link

Copilot AI Feb 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

JSONContent/JSONBytes claim to preserve identifier fields (keys ending in id), but the current implementation collects (orig, redacted) string pairs and then uses strings.ReplaceAll on the entire JSON text. If the same string value appears in both a skipped ID field and a non-skipped field, the replacement will redact the ID field too, breaking the “preserve ID fields” guarantee.

Consider redacting structurally instead (walk the parsed JSON and replace string values in-place while honoring the skip rules, then re-marshal), or apply replacements only to the specific occurrences/offsets that correspond to non-skipped fields rather than global string replacement.

Copilot uses AI. Check for mistakes.
@andreidavid
Copy link
Author

Implemented additional Copilot follow-ups for #333 in commit 9633328:

  • Fixed to redact structurally (no global ReplaceAll collisions), which also addresses duplicate-value case for skipped ID fields.
  • Tightened traversal check to avoid false positives on names like while still blocking real escapes.
  • Added regression coverage for duplicate ID/payload values and leading-dot filename handling.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 7 out of 7 changed files in this pull request and generated 4 comments.

Comment on lines 354 to 361
@@ -355,6 +355,10 @@ func (s *GitStore) writeSessionToSubdirectory(opts WriteCommittedOptions, sessio
if err != nil {
return filePaths, fmt.Errorf("failed to marshal session metadata: %w", err)
}
metadataJSON, err = redact.JSONBytes(metadataJSON)
if err != nil {
return filePaths, fmt.Errorf("failed to redact metadata json: %w", err)
}
Copy link

Copilot AI Feb 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Redacting the entire marshaled CommittedMetadata JSON can unintentionally modify non-secret identifier/path fields that are required for restore and transcript scoping (e.g. transcript_identifier_at_start, transcript_path). redact.JSONBytes currently only skips keys ending in id/ids, so these fields are still subject to high-entropy redaction. Consider redacting only the user/AI content fields (e.g. Summary) before marshaling, or expand the JSON skip rules to explicitly preserve these metadata fields (and add a regression test for them).

Copilot uses AI. Check for mistakes.
Comment on lines 983 to 986
metadataJSON, err = redact.JSONBytes(metadataJSON)
if err != nil {
return fmt.Errorf("failed to redact metadata json: %w", err)
}
Copy link

Copilot AI Feb 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

UpdateSummary re-marshals and re-redacts the whole session metadata JSON. This can retroactively change unrelated metadata fields (including identifier/path fields used by restore) whenever a summary is updated. Prefer redacting only the summary payload (or explicitly skipping required metadata keys) so UpdateSummary cannot clobber other metadata values.

Suggested change
metadataJSON, err = redact.JSONBytes(metadataJSON)
if err != nil {
return fmt.Errorf("failed to redact metadata json: %w", err)
}

Copilot uses AI. Check for mistakes.
redact/redact.go Outdated
Comment on lines 188 to 205
// JSONContent parses a JSON string and redacts string values while skipping
// identifier fields (keys ending in "id") and non-user payload fields.
func JSONContent(content string) (string, error) {
var parsed any
if err := json.Unmarshal([]byte(content), &parsed); err != nil {
return "", err
}

redacted := redactJSONValue(parsed)
if reflect.DeepEqual(parsed, redacted) {
return content, nil
}

redactedJSON, err := json.Marshal(redacted)
if err != nil {
return "", err
}
return string(redactedJSON), nil
Copy link

Copilot AI Feb 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

JSONContent re-marshals redacted JSON with json.Marshal, which strips indentation and the trailing newline from inputs that were originally formatted (e.g. via jsonutil.MarshalIndentWithNewline). That means files become compact only when redaction occurs, making metadata output inconsistent. Consider marshaling with an Encoder configured with the same indent and newline behavior to keep formatting stable.

Copilot uses AI. Check for mistakes.
Comment on lines 910 to 922
relWithinDir = filepath.Clean(relWithinDir)
if relWithinDir == ".." || strings.HasPrefix(relWithinDir, ".."+string(filepath.Separator)) {
return fmt.Errorf("path traversal detected: %s", relWithinDir)
}

blobHash, mode, err := createBlobFromFile(repo, path)
treePath := filepath.Join(dirPathRel, relWithinDir)
blobHash, mode, err := createRedactedBlobFromFile(repo, path, treePath)
if err != nil {
return fmt.Errorf("failed to create blob for %s: %w", path, err)
}

treePath := filepath.Join(dirPathRel, relWithinDir)
entries[treePath] = object.TreeEntry{
Name: treePath,
Mode: mode,
Copy link

Copilot AI Feb 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

treePath := filepath.Join(dirPathRel, relWithinDir) can introduce OS-specific path separators (e.g. \ on Windows). The checkpoint tree builder later splits paths on /, so this will produce incorrect tree structure / missing files on Windows. Use Git-style / paths here (e.g. path.Join(...) or filepath.ToSlash(...) for both parts) before inserting into entries.

Copilot uses AI. Check for mistakes.
@andreidavid
Copy link
Author

Implemented additional Copilot feedback in commit e77525b5.

  • Scoped committed metadata redaction to Summary fields only (instead of re-redacting full marshaled metadata JSON) in both write and update paths.
  • Preserved non-summary metadata fields used by restore/scoping (including transcript_identifier_at_start and transcript_path).
  • Updated redact.JSONContent to preserve formatting style when structural redaction re-marshals JSON (indentation + trailing newline behavior).
  • Normalized temporary metadata tree entry paths to Git-style / separators for cross-platform correctness.
  • Added regression coverage for metadata preservation, JSON formatting preservation, and path-separator normalization.

Validation:

  • mise run lint passes
  • go test ./redact ./cmd/entire/cli/checkpoint passes

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 7 out of 7 changed files in this pull request and generated 1 comment.


blobHash, mode, err := createBlobFromFile(repo, path)
treePath := filepath.ToSlash(filepath.Join(dirPathRel, relWithinDir))
treePath = strings.ReplaceAll(treePath, `\`, "/")
Copy link

Copilot AI Feb 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The ReplaceAll call appears redundant since filepath.ToSlash already converts all backslashes to forward slashes on the previous line. Consider removing this redundant operation.

Suggested change
treePath = strings.ReplaceAll(treePath, `\`, "/")

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

1 participant