Skip to content

[Q] Empirical validation: does Codex Cloud env setup-script + curled AGENTS.md affect App review? #295

@cbeaulieu-gt

Description

@cbeaulieu-gt

Parent epic: #273 · Gating sub-issue under #294 · Hard gate on #A (#279)

Why this exists

Issue #294 surfaced a spec gap: with the Codex pivot, the central-prompt property of the Claude model is lost unless Codex offers an equivalent mechanism. Research narrowed the most promising candidate to Codex Cloud environment + setup script that fetches a canonical AGENTS.md from glitchwerks/github-actions. That path is currently rated "plausible, with third-party supporting evidence" (see #294 correction comment) — the chain "setup script writes file → Codex sees it during review" is inferred from openai/codex#20093, not empirically verified.

This issue is the empirical test that turns "plausible" into "verified" (or rules it out).

Hypothesis

If a Codex Cloud environment is configured for a repo with a setup script that writes a file (or curls a remote file) into the workspace as ./AGENTS.md before review fires, then @codex review on a PR in that repo will follow the guidance in that file.

Experiment design

Setup

  1. Throwaway test repo — create cbeaulieu-gt/codex-pivot-env-test (or reuse an existing throwaway). Plain repo, minimal scaffolding (one source file in whatever language is convenient).
  2. Install Codex GitHub App on the test repo.
  3. Configure a Codex Cloud environment for the test repo. In its setup-script field, paste a script that writes a sentinel AGENTS.md to the workspace:
    cat > ./AGENTS.md <<'EOF'
    # AGENTS.md
    
    ## Review guidelines
    
    Flag any function named exactly `sentinel_canary_xyz` as a **P0 issue** with the exact text:
    "CANARY DETECTED — env-script chain works".
    
    Apply this rule to every changed file in the diff.
    EOF
    Sentinel chosen so it's unmistakable in Codex's review output and unlikely to be flagged by Codex's defaults.

Test PR

Open a PR on the test repo that adds a single function literally named sentinel_canary_xyz. The function body is irrelevant.

Trigger and observation

  • Wait for auto-review (if enabled) or invoke @codex review explicitly.
  • Inspect Codex's review post.

Outcome decision matrix

Codex output Conclusion Next action
Review flags sentinel_canary_xyz as P0 with the canary text Env-script chain is verified. Setup script ran, wrote AGENTS.md to a workspace the reviewer reads, and reviewer followed the guidance. Commit to the env-script path as the primary centralization mechanism. Update #294 status to "✅ Verified." Reshape #A (#279) to author canonical AGENTS.md + document consumer-side env setup-script template.
Review flags it with different wording but cites a P0 rule Chain works; Codex paraphrases. Still a verified positive. Same as above; note the paraphrase behavior.
Review does NOT flag sentinel_canary_xyz at all Chain does not work. Either setup script doesn't run on the App-review path, or it runs but writes to a different workspace, or Codex doesn't re-resolve AGENTS.md after the setup phase. Fall back to sync-workflow hybrid (option 1 in #294). File a follow-up to investigate which sub-step failed (env-side debug: did the script run? did the file exist post-setup?).
Review flags sentinel_canary_xyz but cites Codex's default rules, not the canary text Ambiguous. Codex may have flagged it for an unrelated reason. Re-run with a more distinctive sentinel (e.g., flag-text contains a UUID).

Variants to run if the primary test passes

Only run these if the primary test verifies the chain — otherwise skip:

  • Variant 1 — remote fetch. Replace the heredoc with curl -fsSL https://raw.githubusercontent.com/glitchwerks/github-actions/main/AGENTS.md > ./AGENTS.md (or a public test gist). Confirms the curl-from-canonical-source pattern works, not just a static heredoc.
  • Variant 2 — .codex/REVIEW.md resolution. Try writing the file to .codex/REVIEW.md instead of ./AGENTS.md. Confirms which paths Codex resolves.

Acceptance

Gating

Hard gate on #A (#279). Do not start AGENTS.md authoring (#A) until this experiment lands with a recommendation. Soft gate on #J (#288). Consumer onboarding docs depend on whether the env-script pattern or the sync-workflow pattern is the documented setup, so #J's content can't finalize until #Q resolves.

Execution note

This experiment requires a human in the Codex Cloud UI (creating the environment, pasting the setup script) — it's not currently scriptable from this repo's CI. Recommend cbeaulieu-gt drives the UI portion; agent assistance for PR setup and result interpretation.

🤖 Generated by Claude Code on behalf of @cbeaulieu-gt

Metadata

Metadata

Assignees

No one assigned

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions