[aw-failures] 6h failure cluster (2026-05-24 19:10): Codex CLI 0.133.0 `stream_options.include_usage` regression broke 7 workflo
[Content truncated due to length]

### Executive summary

In the 6-hour window ending **2026-05-24T19:10 UTC**, **9 agentic workflow runs failed**. Distribution by cluster:

| Cluster | Engine | Runs | Symptom | Severity |
|---|---|---:|---|---|
| A — Codex `stream_options.include_usage` 400 | Codex 0.133.0 | **7** | Every retry rejected by chat completions API: `Unknown parameter: 'stream_options.include_usage'` (`invalid_request_error`) | **P0** |
| B — Smoke Copilot dispatch_workflow ref miss | GitHub Copilot CLI | 1 | Agent succeeded; `safe_outputs` job failed because `dispatch_workflow` targeted `refs/heads/codex/review-codex-configuration`, which doesn't carry the `haiku-printer` workflow | P2 |
| C — Avenger max_turns(25) | Claude Code (Opus 4.7) | 1 | Investigation of `codex_engine` test diff exhausted the 25-turn budget; agent never reached a conclusion | P2 |

**Root cause of Cluster A (P0):** PR #34390 merged at 2026-05-24T12:59:44 UTC bumped `DefaultCodexVersion` from `0.130.0` → `0.133.0` to fix the prior Copilot `anthropic-beta` regression tracked in #34394. Codex CLI 0.133.0 sends `stream_options.include_usage` to the OpenAI chat completions API, but the configured model `gpt-5.5` rejects that parameter as unknown. All Codex-engine workflows now fail deterministically; the harness retries 4 attempts, all return the same HTTP 400, and the run exits with code 1.

### Failure cluster table

<details>
<summary>All 9 failed runs (last 6h)</summary>

| Workflow | Run | Engine | Cluster | Started |
|---|---|---|---|---|
| Daily Cache Strategy Analyzer | [§26369636900](https://github.com/github/gh-aw/actions/runs/26369636900) | Codex 0.133.0 | A | 18:43 |
| Avenger | [§26369695031](https://github.com/github/gh-aw/actions/runs/26369695031) | Claude Code | C | 18:46 |
| Changeset Generator | [§26369303262](https://github.com/github/gh-aw/actions/runs/26369303262) | Codex 0.133.0 | A | 18:28 |
| Smoke Codex | [§26369303293](https://github.com/github/gh-aw/actions/runs/26369303293) | Codex 0.133.0 | A | 18:28 |
| Smoke Copilot | [§26369303291](https://github.com/github/gh-aw/actions/runs/26369303291) | Copilot CLI | B | 18:28 |
| Changeset Generator | [§26368932369](https://github.com/github/gh-aw/actions/runs/26368932369) | Codex 0.133.0 | A | 18:11 |
| Smoke Codex | [§26368932442](https://github.com/github/gh-aw/actions/runs/26368932442) | Codex 0.133.0 | A | 18:11 |
| Changeset Generator | [§26368599354](https://github.com/github/gh-aw/actions/runs/26368599354) | Codex 0.133.0 | A | 17:56 |
| Smoke Codex | [§26368599382](https://github.com/github/gh-aw/actions/runs/26368599382) | Codex 0.133.0 | A | 17:56 |

</details>

### Evidence — Cluster A (P0)

Identical engine config across all 7 Codex runs:

```
engine_id: codex
model: gpt-5.5
codex_app_server.client_version: 0.133.0
```

The API rejection is deterministic on every retry (4 attempts × 7 runs):

<details>
<summary>Sample stdio (run 26369636900, Daily Cache Strategy Analyzer)</summary>

```
{"type":"error","message":"{
  \"error\": {
    \"message\": \"Unknown parameter: 'stream_options.include_usage'.\",
    \"type\": \"invalid_request_error\",
    \"param\": \"stream_options.include_usage\",
    \"code\": \"unknown_parameter\"
  }
}"}
{"type":"turn.failed","error":{"message":"{ ... same 400 ... }"}}
[codex-harness] attempt 4 failed: exitCode=1 isRateLimitError=false isAuthenticationFailedError=false isMissingApiKeyError=false isServerError=false hasOutput=true retriesRemaining=0
[codex-harness] all 3 retries exhausted — giving up (exitCode=1)
```

</details>

<details>
<summary>Confirmation: same error in all 7 Codex runs</summary>

| Run | Workflow | Attempts |
|---|---|---|
| 26369636900 | Daily Cache Strategy Analyzer | 4× same 400 |
| 26369303262 | Changeset Generator | 4× same 400 |
| 26369303293 | Smoke Codex | 4× same 400 |
| 26368932369 | Changeset Generator | 4× same 400 |
| 26368932442 | Smoke Codex | 4× same 400 |
| 26368599354 | Changeset Generator | 4× same 400 |
| 26368599382 | Smoke Codex | 4× same 400 |

All runs exit with code 1 after exhausting the 3-retry budget.

</details>

### Audit-diff vs. successful baseline (Cluster A)

Run 26369636900 (failed, Codex 0.133.0) vs 26184060675 (last successful Daily Cache Strategy Analyzer, Codex 0.130.0):

| Metric | Successful baseline | Failed run | Delta |
|---|---:|---:|---|
| `api.openai.com:443` allowed requests | 33 | 6 | -82% (agent never completed any turn) |
| github core API consumed | 142 | 1012 | +613% (retry storm) |
| firewall anomalies | 0 | 0 | unchanged |
| MCP failures | 0 | 0 | unchanged |

The 6-vs-33 OpenAI request count is the smoking gun: the Codex CLI emits the HTTP 400 before any agent reasoning turn occurs.

### Evidence — Cluster B (Smoke Copilot, P2)

Run [§26369303291](https://github.com/github/gh-aw/actions/runs/26369303291). The Copilot agent itself succeeded; the `safe_outputs` job then failed:

```
##[error]Failed to dispatch workflow "haiku-printer": No ref found for: refs/heads/codex/review-codex-configuration - https://docs.github.com/rest/actions/workflows#create-a-workflow-dispatch-event
##[error]✗ Message 8 (dispatch_workflow) failed: Failed to dispatch workflow "haiku-printer": No ref found for: refs/heads/codex/review-codex-configuration
##[error]1 safe output(s) failed:
  - dispatch_workflow: Failed to dispatch workflow "haiku-printer": No ref found for: refs/heads/codex/review-codex-configuration
```

The smoke test runs on a PR branch (`codex/review-codex-configuration`) that does not carry `haiku-printer.md`, so the `workflows/haiku-printer.yml` lockfile isn't present on that ref. This is a smoke-test design gap, not a framework regression. Other safe outputs (`create_issue` #34516, `add_comment_to_discussion`, `upload_artifact`) succeeded in the same job.

### Evidence — Cluster C (Avenger, P2)

Run [§26369695031](https://github.com/github/gh-aw/actions/runs/26369695031). Claude Opus 4.7, `terminal_reason: max_turns`, `errors: ["Reached maximum number of turns (25)"]`, `num_turns: 26`, `total_cost_usd: 2.65`.

The agent spent its budget investigating `pkg/workflow/codex_engine.go` and `pkg/workflow/codex_engine_test.go` history (repeated `git log` queries returning the same single-commit output, then `WebFetchToolConfig` symbol lookups) without converging. `max-turns: 25` is plausibly under-provisioned for Avenger's CI-fixing remit, especially while real upstream regressions (Cluster A) generate confusing test signals.

### Existing tracking correlation

| Issue | Status | Notes |
|---|---|---|
| #34394 | Open (prior parent report) | Documents the now-resolved Copilot 1.0.51 `anthropic-beta` regression. PR #34390 — the fix — is the root cause of the **new** Cluster A. No action needed on #34394; it correctly notes the prior fix shipped. |
| #34390 | Merged 12:59 UTC | The version bump that introduced the Codex 0.133.0 regression. **A revert or partial revert (Codex pin only) is the proposed fix.** |
| #34517 | Open (auto-issue) | Daily Cache Strategy Analyzer failure (run 26369636900) — Cluster A. Body truncates to trace logs and never surfaces the root error. |
| #34520 | Open (auto-issue) | Avenger failure (run 26369695031) — Cluster C. Body shows the `max_turns` exit but not why the agent looped. |
| #34418 | Open (auto-issue, 10 comments) | Smoke Codex — was previously tracking `Missing OPENAI_API_KEY`. Now the failure mode has shifted to `stream_options.include_usage`; subsequent Smoke Codex failures (26369303293, 26368932442, 26368599382) are filed against this same workflow_id. |

No existing open issue tracks the `stream_options.include_usage` regression specifically; this report is the first.

### Proposed fix roadmap

**P0 — immediate (Cluster A)**
1. **Revert the Codex bump in PR #34390**, or apply a follow-up PR that pins `DefaultCodexVersion` back to `0.130.0` while keeping the Copilot 1.0.52 / GitHub MCP v1.0.5 bumps. Validate by re-triggering Smoke Codex and one Codex-engine production workflow.
2. File upstream report to `openai/codex` CLI repo: 0.133.0 emits `stream_options.include_usage` against `gpt-5.5`, which the model does not accept. Wait for upstream 0.134+ before re-bumping.
3. Sub-issue created: see *Sub-issues created* section below.

**P1 — short-term (Cluster B)**
4. Make `dispatch_workflow` safe outputs tolerant of `404 No ref found` when the target workflow isn't present on the source ref (smoke-test path). Either skip with a warning, or auto-fallback to `main`. Workflow file: `pkg/workflow/safeoutputs.go` (handler) + `pkg/safeoutputs/dispatch_workflow.go` if applicable.

**P2 — backlog (Cluster C)**
5. Raise `max-turns` for `avenger.md` to 50 or add an explicit "give up cleanly after 20 turns and call `noop`" guardrail; consider exposing a budget warning the agent can see at turn 20/25.

### Sub-issues created

- **#34522** — *P0 fix: revert/pin Codex CLI 0.133.0 — `stream_options.include_usage` rejected by gpt-5.5*

No sub-issues created for Clusters B and C: existing auto-issues #34520 (Avenger) and the pending smoke-test design work can carry the P2/P1 items without duplication.

### Confidence & unknowns

- **High confidence:** Cluster A root cause and fix. Verified via 7 stdio logs (all show identical `unknown_parameter` 400), `codex_app_server.client_version: 0.133.0` in every log, and audit-diff against a Codex 0.130.0 baseline showing 6 vs 33 OpenAI requests.
- **Medium confidence:** Cluster B is a test-design issue, not a framework bug. Need a maintainer to confirm whether `dispatch_workflow` should soft-fail or whether the smoke test should target `main`.
- **Low confidence:** Cluster C — single occurrence. Could be a one-off complex investigation rather than a recurring `max_turns` issue. Watch for repeats before tuning.
- **Unknowns:** Why Codex 0.133.0 ships with the `stream_options.include_usage` request payload — likely an SDK update inside Codex itself; upstream investigation needed.

### References

- [§26369636900 — Daily Cache Strategy Analyzer (Cluster A representative)](https://github.com/github/gh-aw/actions/runs/26369636900)
- [§26369303291 — Smoke Copilot (Cluster B)](https://github.com/github/gh-aw/actions/runs/26369303291)
- [§26369695031 — Avenger (Cluster C)](https://github.com/github/gh-aw/actions/runs/26369695031)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[aw-failures] 6h failure cluster (2026-05-24 19:10): Codex CLI 0.133.0 `stream_options.include_usage` regression broke 7 workflo [Content truncated due to length] #34521

Executive summary

Failure cluster table

Evidence — Cluster A (P0)

Audit-diff vs. successful baseline (Cluster A)

Evidence — Cluster B (Smoke Copilot, P2)

Evidence — Cluster C (Avenger, P2)

Existing tracking correlation

Proposed fix roadmap

Sub-issues created

Confidence & unknowns

References

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Cluster	Engine	Runs	Symptom	Severity
A — Codex `stream_options.include_usage` 400	Codex 0.133.0	7	Every retry rejected by chat completions API: `Unknown parameter: 'stream_options.include_usage'` (`invalid_request_error`)	P0
B — Smoke Copilot dispatch_workflow ref miss	GitHub Copilot CLI	1	Agent succeeded; `safe_outputs` job failed because `dispatch_workflow` targeted `refs/heads/codex/review-codex-configuration`, which doesn't carry the `haiku-printer` workflow	P2
C — Avenger max_turns(25)	Claude Code (Opus 4.7)	1	Investigation of `codex_engine` test diff exhausted the 25-turn budget; agent never reached a conclusion	P2

Workflow	Run	Engine	Cluster	Started
Daily Cache Strategy Analyzer	§26369636900	Codex 0.133.0	A	18:43
Avenger	§26369695031	Claude Code	C	18:46
Changeset Generator	§26369303262	Codex 0.133.0	A	18:28
Smoke Codex	§26369303293	Codex 0.133.0	A	18:28
Smoke Copilot	§26369303291	Copilot CLI	B	18:28
Changeset Generator	§26368932369	Codex 0.133.0	A	18:11
Smoke Codex	§26368932442	Codex 0.133.0	A	18:11
Changeset Generator	§26368599354	Codex 0.133.0	A	17:56
Smoke Codex	§26368599382	Codex 0.133.0	A	17:56

Run	Workflow	Attempts
26369636900	Daily Cache Strategy Analyzer	4× same 400
26369303262	Changeset Generator	4× same 400
26369303293	Smoke Codex	4× same 400
26368932369	Changeset Generator	4× same 400
26368932442	Smoke Codex	4× same 400
26368599354	Changeset Generator	4× same 400
26368599382	Smoke Codex	4× same 400

Metric	Successful baseline	Failed run	Delta
`api.openai.com:443` allowed requests	33	6	-82% (agent never completed any turn)
github core API consumed	142	1012	+613% (retry storm)
firewall anomalies	0	0	unchanged
MCP failures	0	0	unchanged

Issue	Status	Notes
#34394	Open (prior parent report)	Documents the now-resolved Copilot 1.0.51 `anthropic-beta` regression. PR #34390 — the fix — is the root cause of the new Cluster A. No action needed on #34394; it correctly notes the prior fix shipped.
#34390	Merged 12:59 UTC	The version bump that introduced the Codex 0.133.0 regression. A revert or partial revert (Codex pin only) is the proposed fix.
#34517	Open (auto-issue)	Daily Cache Strategy Analyzer failure (run 26369636900) — Cluster A. Body truncates to trace logs and never surfaces the root error.
#34520	Open (auto-issue)	Avenger failure (run 26369695031) — Cluster C. Body shows the `max_turns` exit but not why the agent looped.
#34418	Open (auto-issue, 10 comments)	Smoke Codex — was previously tracking `Missing OPENAI_API_KEY`. Now the failure mode has shifted to `stream_options.include_usage`; subsequent Smoke Codex failures (26369303293, 26368932442, 26368599382) are filed against this same workflow_id.

[aw-failures] 6h failure cluster (2026-05-24 19:10): Codex CLI 0.133.0 stream_options.include_usage regression broke 7 workflo [Content truncated due to length] #34521

Description

Executive summary

Failure cluster table

Evidence — Cluster A (P0)

Audit-diff vs. successful baseline (Cluster A)

Evidence — Cluster B (Smoke Copilot, P2)

Evidence — Cluster C (Avenger, P2)

Existing tracking correlation

Proposed fix roadmap

Sub-issues created

Confidence & unknowns

References

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions

[aw-failures] 6h failure cluster (2026-05-24 19:10): Codex CLI 0.133.0 `stream_options.include_usage` regression broke 7 workflo [Content truncated due to length] #34521