Skip to content

[aw-failures] 6h failure cluster (2026-05-24 19:10): Codex CLI 0.133.0 stream_options.include_usage regression broke 7 workflo [Content truncated due to length] #34521

@github-actions

Description

@github-actions

Executive summary

In the 6-hour window ending 2026-05-24T19:10 UTC, 9 agentic workflow runs failed. Distribution by cluster:

Cluster Engine Runs Symptom Severity
A — Codex stream_options.include_usage 400 Codex 0.133.0 7 Every retry rejected by chat completions API: Unknown parameter: 'stream_options.include_usage' (invalid_request_error) P0
B — Smoke Copilot dispatch_workflow ref miss GitHub Copilot CLI 1 Agent succeeded; safe_outputs job failed because dispatch_workflow targeted refs/heads/codex/review-codex-configuration, which doesn't carry the haiku-printer workflow P2
C — Avenger max_turns(25) Claude Code (Opus 4.7) 1 Investigation of codex_engine test diff exhausted the 25-turn budget; agent never reached a conclusion P2

Root cause of Cluster A (P0): PR #34390 merged at 2026-05-24T12:59:44 UTC bumped DefaultCodexVersion from 0.130.00.133.0 to fix the prior Copilot anthropic-beta regression tracked in #34394. Codex CLI 0.133.0 sends stream_options.include_usage to the OpenAI chat completions API, but the configured model gpt-5.5 rejects that parameter as unknown. All Codex-engine workflows now fail deterministically; the harness retries 4 attempts, all return the same HTTP 400, and the run exits with code 1.

Failure cluster table

All 9 failed runs (last 6h)
Workflow Run Engine Cluster Started
Daily Cache Strategy Analyzer §26369636900 Codex 0.133.0 A 18:43
Avenger §26369695031 Claude Code C 18:46
Changeset Generator §26369303262 Codex 0.133.0 A 18:28
Smoke Codex §26369303293 Codex 0.133.0 A 18:28
Smoke Copilot §26369303291 Copilot CLI B 18:28
Changeset Generator §26368932369 Codex 0.133.0 A 18:11
Smoke Codex §26368932442 Codex 0.133.0 A 18:11
Changeset Generator §26368599354 Codex 0.133.0 A 17:56
Smoke Codex §26368599382 Codex 0.133.0 A 17:56

Evidence — Cluster A (P0)

Identical engine config across all 7 Codex runs:

engine_id: codex
model: gpt-5.5
codex_app_server.client_version: 0.133.0

The API rejection is deterministic on every retry (4 attempts × 7 runs):

Sample stdio (run 26369636900, Daily Cache Strategy Analyzer)
{"type":"error","message":"{
  \"error\": {
    \"message\": \"Unknown parameter: 'stream_options.include_usage'.\",
    \"type\": \"invalid_request_error\",
    \"param\": \"stream_options.include_usage\",
    \"code\": \"unknown_parameter\"
  }
}"}
{"type":"turn.failed","error":{"message":"{ ... same 400 ... }"}}
[codex-harness] attempt 4 failed: exitCode=1 isRateLimitError=false isAuthenticationFailedError=false isMissingApiKeyError=false isServerError=false hasOutput=true retriesRemaining=0
[codex-harness] all 3 retries exhausted — giving up (exitCode=1)
Confirmation: same error in all 7 Codex runs
Run Workflow Attempts
26369636900 Daily Cache Strategy Analyzer 4× same 400
26369303262 Changeset Generator 4× same 400
26369303293 Smoke Codex 4× same 400
26368932369 Changeset Generator 4× same 400
26368932442 Smoke Codex 4× same 400
26368599354 Changeset Generator 4× same 400
26368599382 Smoke Codex 4× same 400

All runs exit with code 1 after exhausting the 3-retry budget.

Audit-diff vs. successful baseline (Cluster A)

Run 26369636900 (failed, Codex 0.133.0) vs 26184060675 (last successful Daily Cache Strategy Analyzer, Codex 0.130.0):

Metric Successful baseline Failed run Delta
api.openai.com:443 allowed requests 33 6 -82% (agent never completed any turn)
github core API consumed 142 1012 +613% (retry storm)
firewall anomalies 0 0 unchanged
MCP failures 0 0 unchanged

The 6-vs-33 OpenAI request count is the smoking gun: the Codex CLI emits the HTTP 400 before any agent reasoning turn occurs.

Evidence — Cluster B (Smoke Copilot, P2)

Run §26369303291. The Copilot agent itself succeeded; the safe_outputs job then failed:

##[error]Failed to dispatch workflow "haiku-printer": No ref found for: refs/heads/codex/review-codex-configuration - https://docs.github.com/rest/actions/workflows#create-a-workflow-dispatch-event
##[error]✗ Message 8 (dispatch_workflow) failed: Failed to dispatch workflow "haiku-printer": No ref found for: refs/heads/codex/review-codex-configuration
##[error]1 safe output(s) failed:
  - dispatch_workflow: Failed to dispatch workflow "haiku-printer": No ref found for: refs/heads/codex/review-codex-configuration

The smoke test runs on a PR branch (codex/review-codex-configuration) that does not carry haiku-printer.md, so the workflows/haiku-printer.yml lockfile isn't present on that ref. This is a smoke-test design gap, not a framework regression. Other safe outputs (create_issue #34516, add_comment_to_discussion, upload_artifact) succeeded in the same job.

Evidence — Cluster C (Avenger, P2)

Run §26369695031. Claude Opus 4.7, terminal_reason: max_turns, errors: ["Reached maximum number of turns (25)"], num_turns: 26, total_cost_usd: 2.65.

The agent spent its budget investigating pkg/workflow/codex_engine.go and pkg/workflow/codex_engine_test.go history (repeated git log queries returning the same single-commit output, then WebFetchToolConfig symbol lookups) without converging. max-turns: 25 is plausibly under-provisioned for Avenger's CI-fixing remit, especially while real upstream regressions (Cluster A) generate confusing test signals.

Existing tracking correlation

Issue Status Notes
#34394 Open (prior parent report) Documents the now-resolved Copilot 1.0.51 anthropic-beta regression. PR #34390 — the fix — is the root cause of the new Cluster A. No action needed on #34394; it correctly notes the prior fix shipped.
#34390 Merged 12:59 UTC The version bump that introduced the Codex 0.133.0 regression. A revert or partial revert (Codex pin only) is the proposed fix.
#34517 Open (auto-issue) Daily Cache Strategy Analyzer failure (run 26369636900) — Cluster A. Body truncates to trace logs and never surfaces the root error.
#34520 Open (auto-issue) Avenger failure (run 26369695031) — Cluster C. Body shows the max_turns exit but not why the agent looped.
#34418 Open (auto-issue, 10 comments) Smoke Codex — was previously tracking Missing OPENAI_API_KEY. Now the failure mode has shifted to stream_options.include_usage; subsequent Smoke Codex failures (26369303293, 26368932442, 26368599382) are filed against this same workflow_id.

No existing open issue tracks the stream_options.include_usage regression specifically; this report is the first.

Proposed fix roadmap

P0 — immediate (Cluster A)

  1. Revert the Codex bump in PR Bump pinned Copilot/Codex/GitHub MCP versions and regenerate workflow artifacts #34390, or apply a follow-up PR that pins DefaultCodexVersion back to 0.130.0 while keeping the Copilot 1.0.52 / GitHub MCP v1.0.5 bumps. Validate by re-triggering Smoke Codex and one Codex-engine production workflow.
  2. File upstream report to openai/codex CLI repo: 0.133.0 emits stream_options.include_usage against gpt-5.5, which the model does not accept. Wait for upstream 0.134+ before re-bumping.
  3. Sub-issue created: see Sub-issues created section below.

P1 — short-term (Cluster B)
4. Make dispatch_workflow safe outputs tolerant of 404 No ref found when the target workflow isn't present on the source ref (smoke-test path). Either skip with a warning, or auto-fallback to main. Workflow file: pkg/workflow/safeoutputs.go (handler) + pkg/safeoutputs/dispatch_workflow.go if applicable.

P2 — backlog (Cluster C)
5. Raise max-turns for avenger.md to 50 or add an explicit "give up cleanly after 20 turns and call noop" guardrail; consider exposing a budget warning the agent can see at turn 20/25.

Sub-issues created

No sub-issues created for Clusters B and C: existing auto-issues #34520 (Avenger) and the pending smoke-test design work can carry the P2/P1 items without duplication.

Confidence & unknowns

  • High confidence: Cluster A root cause and fix. Verified via 7 stdio logs (all show identical unknown_parameter 400), codex_app_server.client_version: 0.133.0 in every log, and audit-diff against a Codex 0.130.0 baseline showing 6 vs 33 OpenAI requests.
  • Medium confidence: Cluster B is a test-design issue, not a framework bug. Need a maintainer to confirm whether dispatch_workflow should soft-fail or whether the smoke test should target main.
  • Low confidence: Cluster C — single occurrence. Could be a one-off complex investigation rather than a recurring max_turns issue. Watch for repeats before tuning.
  • Unknowns: Why Codex 0.133.0 ships with the stream_options.include_usage request payload — likely an SDK update inside Codex itself; upstream investigation needed.

References

Metadata

Metadata

Assignees

No one assigned

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions