[Q] Empirical validation: does Codex Cloud env setup-script + curled AGENTS.md affect App review?

Parent epic: #273 · Gating sub-issue under #294 · Hard gate on #A (#279)

## Why this exists

Issue #294 surfaced a spec gap: with the Codex pivot, the central-prompt property of the Claude model is lost unless Codex offers an equivalent mechanism. Research narrowed the most promising candidate to **Codex Cloud environment + setup script that fetches a canonical `AGENTS.md`** from `glitchwerks/github-actions`. That path is currently rated **"plausible, with third-party supporting evidence"** (see [#294 correction comment](https://github.com/glitchwerks/github-actions/issues/294#issuecomment-4503549472)) — the chain "setup script writes file → Codex sees it during review" is inferred from [openai/codex#20093](https://github.com/openai/codex/issues/20093), not empirically verified.

This issue is the empirical test that turns "plausible" into "verified" (or rules it out).

## Hypothesis

If a Codex Cloud environment is configured for a repo with a setup script that writes a file (or `curl`s a remote file) into the workspace as `./AGENTS.md` before review fires, then `@codex review` on a PR in that repo will follow the guidance in that file.

## Experiment design

### Setup

1. **Throwaway test repo** — create `cbeaulieu-gt/codex-pivot-env-test` (or reuse an existing throwaway). Plain repo, minimal scaffolding (one source file in whatever language is convenient).
2. **Install Codex GitHub App** on the test repo.
3. **Configure a Codex Cloud environment** for the test repo. In its setup-script field, paste a script that writes a sentinel `AGENTS.md` to the workspace:
   ```bash
   cat > ./AGENTS.md <<'EOF'
   # AGENTS.md

   ## Review guidelines

   Flag any function named exactly `sentinel_canary_xyz` as a **P0 issue** with the exact text:
   "CANARY DETECTED — env-script chain works".

   Apply this rule to every changed file in the diff.
   EOF
   ```
   Sentinel chosen so it's unmistakable in Codex's review output and unlikely to be flagged by Codex's defaults.

### Test PR

Open a PR on the test repo that adds a single function literally named `sentinel_canary_xyz`. The function body is irrelevant.

### Trigger and observation

- Wait for auto-review (if enabled) or invoke `@codex review` explicitly.
- Inspect Codex's review post.

### Outcome decision matrix

| Codex output | Conclusion | Next action |
|---|---|---|
| Review flags `sentinel_canary_xyz` as P0 with the canary text | **Env-script chain is verified.** Setup script ran, wrote AGENTS.md to a workspace the reviewer reads, and reviewer followed the guidance. | Commit to the env-script path as the primary centralization mechanism. Update #294 status to "✅ Verified." Reshape #A (#279) to author canonical AGENTS.md + document consumer-side env setup-script template. |
| Review flags it with different wording but cites a P0 rule | **Chain works; Codex paraphrases.** Still a verified positive. | Same as above; note the paraphrase behavior. |
| Review does NOT flag `sentinel_canary_xyz` at all | **Chain does not work.** Either setup script doesn't run on the App-review path, or it runs but writes to a different workspace, or Codex doesn't re-resolve AGENTS.md after the setup phase. | Fall back to sync-workflow hybrid (option 1 in #294). File a follow-up to investigate which sub-step failed (env-side debug: did the script run? did the file exist post-setup?). |
| Review flags `sentinel_canary_xyz` but cites Codex's default rules, not the canary text | **Ambiguous.** Codex may have flagged it for an unrelated reason. | Re-run with a more distinctive sentinel (e.g., flag-text contains a UUID). |

### Variants to run if the primary test passes

Only run these if the primary test verifies the chain — otherwise skip:

- **Variant 1 — remote fetch.** Replace the heredoc with `curl -fsSL https://raw.githubusercontent.com/glitchwerks/github-actions/main/AGENTS.md > ./AGENTS.md` (or a public test gist). Confirms the `curl`-from-canonical-source pattern works, not just a static heredoc.
- **Variant 2 — `.codex/REVIEW.md` resolution.** Try writing the file to `.codex/REVIEW.md` instead of `./AGENTS.md`. Confirms which paths Codex resolves.

## Acceptance

- Test result documented in this issue verbatim (Codex's review output pasted into a comment)
- Outcome decision row identified
- If positive: #294 status updated, #A (#279) body reshaped accordingly, decision attached to #N (decision-gate observation log)
- If negative: fall-back path identified, follow-up issue filed if further env-side debugging is warranted

## Gating

**Hard gate on #A (#279).** Do not start AGENTS.md authoring (#A) until this experiment lands with a recommendation. **Soft gate on #J (#288).** Consumer onboarding docs depend on whether the env-script pattern or the sync-workflow pattern is the documented setup, so #J's content can't finalize until #Q resolves.

## Execution note

This experiment requires a human in the Codex Cloud UI (creating the environment, pasting the setup script) — it's not currently scriptable from this repo's CI. Recommend cbeaulieu-gt drives the UI portion; agent assistance for PR setup and result interpretation.

🤖 _Generated by Claude Code on behalf of @cbeaulieu-gt_

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Q] Empirical validation: does Codex Cloud env setup-script + curled AGENTS.md affect App review? #295

Why this exists

Hypothesis

Experiment design

Setup

Test PR

Trigger and observation

Outcome decision matrix

Variants to run if the primary test passes

Acceptance

Gating

Execution note

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Codex output	Conclusion	Next action
Review flags `sentinel_canary_xyz` as P0 with the canary text	Env-script chain is verified. Setup script ran, wrote AGENTS.md to a workspace the reviewer reads, and reviewer followed the guidance.	Commit to the env-script path as the primary centralization mechanism. Update #294 status to "✅ Verified." Reshape #A (#279) to author canonical AGENTS.md + document consumer-side env setup-script template.
Review flags it with different wording but cites a P0 rule	Chain works; Codex paraphrases. Still a verified positive.	Same as above; note the paraphrase behavior.
Review does NOT flag `sentinel_canary_xyz` at all	Chain does not work. Either setup script doesn't run on the App-review path, or it runs but writes to a different workspace, or Codex doesn't re-resolve AGENTS.md after the setup phase.	Fall back to sync-workflow hybrid (option 1 in #294). File a follow-up to investigate which sub-step failed (env-side debug: did the script run? did the file exist post-setup?).
Review flags `sentinel_canary_xyz` but cites Codex's default rules, not the canary text	Ambiguous. Codex may have flagged it for an unrelated reason.	Re-run with a more distinctive sentinel (e.g., flag-text contains a UUID).

[Q] Empirical validation: does Codex Cloud env setup-script + curled AGENTS.md affect App review? #295

Description

Why this exists

Hypothesis

Experiment design

Setup

Test PR

Trigger and observation

Outcome decision matrix

Variants to run if the primary test passes

Acceptance

Gating

Execution note

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions