Skip to content

chronicle: gap analysis & follow-ups vs. OpenAI Codex Chronicle (binary archaeology) #39

@haasonsaas

Description

@haasonsaas

Context

We inspected /Applications/Codex.app/Contents/Resources/codex_chronicle (4.2 MB arm64 Rust binary, signed Developer ID Application: OpenAI OpCo, LLC (2DC432GLL2), hardened runtime, notarization stapled). Findings, with what they imply for our roadmap. This is an umbrella issue; line items below should be split into focused PRs/issues as they get picked up.

What Codex Chronicle actually is

  • Language: Rust, calling Apple frameworks directly (ScreenCaptureKit, Vision, CoreML, AVFoundation).
  • Process model: main daemon + per-display --list-displays-child + --capture-screenshot-child + an SCKit child. Single-instance lockfile (codex_chronicle.lock) and pidfile (\$TMPDIR/codex_chronicle/chronicle-started.pid).
  • Sampling primary: OCR diff. "One JSON object per material text change" in append-only .ocr.jsonl sidecars. Not fixed-fps + pHash like ours.
  • Storage:
    • Ephemeral: \$TMPDIR/chronicle/screen_recording/<segment_ts>-display-<id>-{latest.jpg,.capture,.capture.json,.ocr.jsonl} plus historical frame-<idx>-<min_bucket>Z.jpg.
    • Memories: ~/.codex/memories_extensions/chronicle/{instructions.md, YYYY-MM-DDTHH-MM-SS-XXXX-(10min|6h)-<slug>.md}.
  • Privacy filter: window-identity-based only. PrivacyFilter { signature, stable_observations } + BrowserWindowObservation { name, ... }. Hard-coded rules for Chrome incognito, Safari Private Browsing, Google Meet (meet.google.com). No content scrub. There is no secret regex anywhere in the binary.
  • Wire format includes a per-frame safe_to_persist flag on ScreenshotChildSuccess.
  • Where data goes: frames + OCR ship to OpenAI servers via an internal model_provider=\"openai-memgen\" (requires_openai_auth=true, supports_websockets=true, header X-OpenAI-Memgen-Request: true). Summaries return as plaintext markdown stored locally.
  • Summarizer: recursive 10min → 6h with convergence check, run as an ephemeral Codex sub-process with --ephemeral --ignore-user-config --ignore-rules --sandbox read-only and almost every feature disabled (features.memories=false, features.apps=false, features.plugins=false, features.multi_agent=false, web_search=\"disabled\", mcp_servers={}, analytics.enabled=false, otel.exporter=\"none\", skills.config=[{name=\"chronicle\", enabled=false}]).
  • Prompt-injection defense: purely prompt-engineering. Embedded BEGIN UNTRUSTED OBSERVED INPUT framing plus an explicit attack taxonomy (authority-boundary, role-claim, future-agent-instruction, destructive-cleanup-as-rule, expected-memory-in-fixtures, attorney-client euphemisms, ambiguous sensitive content).
  • Audio: wired (captureMicrophone, microphoneCaptureDeviceID, excludesCurrentProcessAudio). Info.plist declares NSMicrophoneUsageDescription + NSAudioCaptureUsageDescription. Not user-facing yet.
  • Entitlements: app-sandbox: false, allow-jit, allow-unsigned-executable-memory, device.audio-input, network.client, files.user-selected.read-write (Electron parent posture).
  • macOS minimum: 12.0 with version-gated paths (captureImageInRect ≥15.2, captureScreenshot ≥26.0).

Things to add or harden in agentd / Chronicle pipeline

Capture-side parity (worth taking from them)

  • Material-text-change OCR-diff gate as a secondary sampler after pHash. Catches "pixel-identical, text-shifted" runs (modal toggles, focus changes inside the same canvas) that pHash misses. Land it as an optional gate in CaptureService.swift writing an analogue of their .ocr.jsonl sidecar.
  • Multi-display concurrent capture. Builds on existing capture: add multi-display observability and adaptive OCR budgets #34. Match Codex's per-display segment files and combine-by-timestamp pattern, with a per-display displayId field on SubmitBatchRequest frames so server can fan-in correctly.
  • Multi-process crash isolation for ScreenCaptureKit. Spawn SCKit in a child process as Codex does — survives SCKit crashes/leaks without taking the menu-bar app down. Out-of-scope if it's a big lift; at minimum add an SCKit-watchdog that restarts the in-process pipeline.
  • safe_to_persist flag on the wire. Add an explicit per-frame boolean (and reason enum) to the chronicle.v1 Frame/SubmitBatchRequest proto so audit reviewers can see which fail-closed rail fired (secret-scrub-hit, app-deny, path-deny, window-title-pause, server-policy-pause). Belongs in evalops/platform; cross-link the proto change.
  • Pidfile / heartbeat-recency precondition for downstream consumers. Document that any consumer (pipeline, audit query, future MCP read-surface) MUST verify heartbeat freshness before treating frames as current. Mirrors Codex's pidfile-validity rule. Surface lives server-side; client should expose its RegisterDevice/Heartbeat last-seen.
  • Browser-window-aware default deny. Add the equivalent of Codex's BrowserWindowObservation + chrome_incognito_title / safari_private_browsing_title / meet.google.com / Meet - detection to default denyWindowTitlePatterns. This is a small, well-defined addition that complements (does not replace) our content scrub. Cross-references privacy: support scheduled and policy-driven auto-pause windows #36.
  • browser_window_unstable / browser_window_missing_title failure-mode handling. When AX/window observation is unreliable, fall through to fail-closed deny for that frame. Today we likely persist with degraded metadata.

Pipeline-side adds (evalops/platform)

  • Recursive chronological summarizer for audit roll-ups. 10-min summaries → 6-hour roll-ups → daily, with convergence check and provenance back to the originating frames. This is the audit-reviewability story; today reviewers walk raw frames. Server-side, never the device.
  • URL stripping in any agent-readable view of the audit trail. Codex strips URLs from summaries entirely as a leak-vector + prompt-injection mitigation. Worth a default-redact-with-opt-in posture in audit query results that an internal copilot would consume.
  • Recursive-summarizer ephemeral-sandbox config posture. When/if any agent ever touches Chronicle evidence to summarize, run with the equivalent of Codex's --sandbox read-only --ephemeral --ignore-user-config --ignore-rules, telemetry/MCP/plugins/multi-agent/web-search all disabled. Document this as the required posture for evalops/maestro or any future internal consumer.

Things we are ahead on — harden + document

  • Fail-closed content-aware secret scrub is category-defining versus Codex Chronicle, the OSS clones, and Microsoft Recall. Move from "a paragraph in the README" to a first-class capability page (docs/secret-scrub.md?) with the regex family list, the fail-closed semantics (frame dropped, never partial-redacted), the OCR-text + window-title + document-path coverage, and a table comparing it to Codex's window-identity-only filter and Recall's app exclusions.
  • Fleet CapturePolicy + local hard-deny rails. Codex Chronicle is single-user-only by construction. Document the RegisterDevice / Heartbeat / server-pushed policy, and especially the local-hard-deny-wins-over-server-allow rail. This is the IT/security buyer story.
  • Encryption at rest by default in remote/broker mode (Keychain-backed AES-GCM, .agentdbatch extension). Codex stores plaintext JPEGs + plaintext OCR sidecars + plaintext markdown. Make the default explicit on the README diff.
  • Optional ASB Secret Broker artifact wrap. chronicle_frame_batch_json artifact, only the artifact ref leaves the device, meterable and revocable through ASB. Codex has nothing analogous.
  • Hardware-backed permission smoke (existing Smoke test Screen Recording and Accessibility permission flow #25) and notarized signed release (Run Developer ID notarization for packaged agentd app #24) — pull both into the README hero comparison; the OSS clones explicitly cannot ship a notarized binary today.

Devex / positioning

  • README hero diff vs. OpenAI Codex Chronicle. Anyone Googling "Chronicle macOS screen capture" lands on OpenAI first. The agentd README should frame the difference up front: subject of capture (humans-and-agents-at-work vs. "help my Codex agent remember me"), governance posture (fleet-policy + fail-closed vs. single-user opt-in), data plane (self-hosted Connect/proto vs. OpenAI-hosted summarizer round-trip), evidence model (frames + scrub vs. LLM-summarized markdown). One table, no marketing copy.
  • Comparison page covering Codex Chronicle, Einsia/OpenChronicle, Screenata/open-chronicle, Microsoft Recall. Same table, axis-by-axis.
  • Threat-model doc. Explicitly state that we do not feed observed content into an on-device LLM, so the entire prompt-injection class Codex is patching with words is architecturally absent. This is a real differentiator that's invisible until said out loud.

Things we deliberately do NOT want to copy

  • Shipping frames or OCR text to a third-party LLM provider for summarization. This is the load-bearing reason Codex's posture cannot meet enterprise audit. Any internal summarization must run inside the customer's control plane, with the ephemeral-sandbox config posture above.
  • Window-identity-only privacy filter without content scrub. Do not regress to the simpler model just because it's what the market sees as "good enough."
  • Plaintext markdown memories on disk. Even for "local-only" demo modes, default-on encrypt-at-rest stays on.
  • Audio capture. Stay screen-only until we have a stated audit reason. "Chronicle observes work product, not conversation" is a defensible scope claim worth keeping.
  • Sparkle-style auto-update from anywhere. Use a signed update channel only (release: add launch-at-login and signed update channel #33), and never an auto-update path that can land code on the device without re-running the notarization/policy gate.

Existing related issues

Source

Findings from local inspection of /Applications/Codex.app/Contents/Resources/codex_chronicle (com.openai.codex 26.422.30944, signed Apr 24 2026), cross-checked against Chronicle – Codex docs.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions