You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In the 6-hour window ending 2026-05-24T19:10 UTC, 9 agentic workflow runs failed. Distribution by cluster:
Cluster
Engine
Runs
Symptom
Severity
A — Codex stream_options.include_usage 400
Codex 0.133.0
7
Every retry rejected by chat completions API: Unknown parameter: 'stream_options.include_usage' (invalid_request_error)
P0
B — Smoke Copilot dispatch_workflow ref miss
GitHub Copilot CLI
1
Agent succeeded; safe_outputs job failed because dispatch_workflow targeted refs/heads/codex/review-codex-configuration, which doesn't carry the haiku-printer workflow
P2
C — Avenger max_turns(25)
Claude Code (Opus 4.7)
1
Investigation of codex_engine test diff exhausted the 25-turn budget; agent never reached a conclusion
P2
Root cause of Cluster A (P0): PR #34390 merged at 2026-05-24T12:59:44 UTC bumped DefaultCodexVersion from 0.130.0 → 0.133.0 to fix the prior Copilot anthropic-beta regression tracked in #34394. Codex CLI 0.133.0 sends stream_options.include_usage to the OpenAI chat completions API, but the configured model gpt-5.5 rejects that parameter as unknown. All Codex-engine workflows now fail deterministically; the harness retries 4 attempts, all return the same HTTP 400, and the run exits with code 1.
All runs exit with code 1 after exhausting the 3-retry budget.
Audit-diff vs. successful baseline (Cluster A)
Run 26369636900 (failed, Codex 0.133.0) vs 26184060675 (last successful Daily Cache Strategy Analyzer, Codex 0.130.0):
Metric
Successful baseline
Failed run
Delta
api.openai.com:443 allowed requests
33
6
-82% (agent never completed any turn)
github core API consumed
142
1012
+613% (retry storm)
firewall anomalies
0
0
unchanged
MCP failures
0
0
unchanged
The 6-vs-33 OpenAI request count is the smoking gun: the Codex CLI emits the HTTP 400 before any agent reasoning turn occurs.
Evidence — Cluster B (Smoke Copilot, P2)
Run §26369303291. The Copilot agent itself succeeded; the safe_outputs job then failed:
##[error]Failed to dispatch workflow "haiku-printer": No ref found for: refs/heads/codex/review-codex-configuration - https://docs.github.com/rest/actions/workflows#create-a-workflow-dispatch-event
##[error]✗ Message 8 (dispatch_workflow) failed: Failed to dispatch workflow "haiku-printer": No ref found for: refs/heads/codex/review-codex-configuration
##[error]1 safe output(s) failed:
- dispatch_workflow: Failed to dispatch workflow "haiku-printer": No ref found for: refs/heads/codex/review-codex-configuration
The smoke test runs on a PR branch (codex/review-codex-configuration) that does not carry haiku-printer.md, so the workflows/haiku-printer.yml lockfile isn't present on that ref. This is a smoke-test design gap, not a framework regression. Other safe outputs (create_issue#34516, add_comment_to_discussion, upload_artifact) succeeded in the same job.
Evidence — Cluster C (Avenger, P2)
Run §26369695031. Claude Opus 4.7, terminal_reason: max_turns, errors: ["Reached maximum number of turns (25)"], num_turns: 26, total_cost_usd: 2.65.
The agent spent its budget investigating pkg/workflow/codex_engine.go and pkg/workflow/codex_engine_test.go history (repeated git log queries returning the same single-commit output, then WebFetchToolConfig symbol lookups) without converging. max-turns: 25 is plausibly under-provisioned for Avenger's CI-fixing remit, especially while real upstream regressions (Cluster A) generate confusing test signals.
Documents the now-resolved Copilot 1.0.51 anthropic-beta regression. PR #34390 — the fix — is the root cause of the new Cluster A. No action needed on #34394; it correctly notes the prior fix shipped.
Smoke Codex — was previously tracking Missing OPENAI_API_KEY. Now the failure mode has shifted to stream_options.include_usage; subsequent Smoke Codex failures (26369303293, 26368932442, 26368599382) are filed against this same workflow_id.
No existing open issue tracks the stream_options.include_usage regression specifically; this report is the first.
File upstream report to openai/codex CLI repo: 0.133.0 emits stream_options.include_usage against gpt-5.5, which the model does not accept. Wait for upstream 0.134+ before re-bumping.
Sub-issue created: see Sub-issues created section below.
P1 — short-term (Cluster B)
4. Make dispatch_workflow safe outputs tolerant of 404 No ref found when the target workflow isn't present on the source ref (smoke-test path). Either skip with a warning, or auto-fallback to main. Workflow file: pkg/workflow/safeoutputs.go (handler) + pkg/safeoutputs/dispatch_workflow.go if applicable.
P2 — backlog (Cluster C)
5. Raise max-turns for avenger.md to 50 or add an explicit "give up cleanly after 20 turns and call noop" guardrail; consider exposing a budget warning the agent can see at turn 20/25.
No sub-issues created for Clusters B and C: existing auto-issues #34520 (Avenger) and the pending smoke-test design work can carry the P2/P1 items without duplication.
Confidence & unknowns
High confidence: Cluster A root cause and fix. Verified via 7 stdio logs (all show identical unknown_parameter 400), codex_app_server.client_version: 0.133.0 in every log, and audit-diff against a Codex 0.130.0 baseline showing 6 vs 33 OpenAI requests.
Medium confidence: Cluster B is a test-design issue, not a framework bug. Need a maintainer to confirm whether dispatch_workflow should soft-fail or whether the smoke test should target main.
Low confidence: Cluster C — single occurrence. Could be a one-off complex investigation rather than a recurring max_turns issue. Watch for repeats before tuning.
Unknowns: Why Codex 0.133.0 ships with the stream_options.include_usage request payload — likely an SDK update inside Codex itself; upstream investigation needed.
Executive summary
In the 6-hour window ending 2026-05-24T19:10 UTC, 9 agentic workflow runs failed. Distribution by cluster:
stream_options.include_usage400Unknown parameter: 'stream_options.include_usage'(invalid_request_error)safe_outputsjob failed becausedispatch_workflowtargetedrefs/heads/codex/review-codex-configuration, which doesn't carry thehaiku-printerworkflowcodex_enginetest diff exhausted the 25-turn budget; agent never reached a conclusionRoot cause of Cluster A (P0): PR #34390 merged at 2026-05-24T12:59:44 UTC bumped
DefaultCodexVersionfrom0.130.0→0.133.0to fix the prior Copilotanthropic-betaregression tracked in #34394. Codex CLI 0.133.0 sendsstream_options.include_usageto the OpenAI chat completions API, but the configured modelgpt-5.5rejects that parameter as unknown. All Codex-engine workflows now fail deterministically; the harness retries 4 attempts, all return the same HTTP 400, and the run exits with code 1.Failure cluster table
All 9 failed runs (last 6h)
Evidence — Cluster A (P0)
Identical engine config across all 7 Codex runs:
The API rejection is deterministic on every retry (4 attempts × 7 runs):
Sample stdio (run 26369636900, Daily Cache Strategy Analyzer)
Confirmation: same error in all 7 Codex runs
All runs exit with code 1 after exhausting the 3-retry budget.
Audit-diff vs. successful baseline (Cluster A)
Run 26369636900 (failed, Codex 0.133.0) vs 26184060675 (last successful Daily Cache Strategy Analyzer, Codex 0.130.0):
api.openai.com:443allowed requestsThe 6-vs-33 OpenAI request count is the smoking gun: the Codex CLI emits the HTTP 400 before any agent reasoning turn occurs.
Evidence — Cluster B (Smoke Copilot, P2)
Run §26369303291. The Copilot agent itself succeeded; the
safe_outputsjob then failed:The smoke test runs on a PR branch (
codex/review-codex-configuration) that does not carryhaiku-printer.md, so theworkflows/haiku-printer.ymllockfile isn't present on that ref. This is a smoke-test design gap, not a framework regression. Other safe outputs (create_issue#34516,add_comment_to_discussion,upload_artifact) succeeded in the same job.Evidence — Cluster C (Avenger, P2)
Run §26369695031. Claude Opus 4.7,
terminal_reason: max_turns,errors: ["Reached maximum number of turns (25)"],num_turns: 26,total_cost_usd: 2.65.The agent spent its budget investigating
pkg/workflow/codex_engine.goandpkg/workflow/codex_engine_test.gohistory (repeatedgit logqueries returning the same single-commit output, thenWebFetchToolConfigsymbol lookups) without converging.max-turns: 25is plausibly under-provisioned for Avenger's CI-fixing remit, especially while real upstream regressions (Cluster A) generate confusing test signals.Existing tracking correlation
anthropic-betaregression. PR #34390 — the fix — is the root cause of the new Cluster A. No action needed on #34394; it correctly notes the prior fix shipped.max_turnsexit but not why the agent looped.Missing OPENAI_API_KEY. Now the failure mode has shifted tostream_options.include_usage; subsequent Smoke Codex failures (26369303293, 26368932442, 26368599382) are filed against this same workflow_id.No existing open issue tracks the
stream_options.include_usageregression specifically; this report is the first.Proposed fix roadmap
P0 — immediate (Cluster A)
DefaultCodexVersionback to0.130.0while keeping the Copilot 1.0.52 / GitHub MCP v1.0.5 bumps. Validate by re-triggering Smoke Codex and one Codex-engine production workflow.openai/codexCLI repo: 0.133.0 emitsstream_options.include_usageagainstgpt-5.5, which the model does not accept. Wait for upstream 0.134+ before re-bumping.P1 — short-term (Cluster B)
4. Make
dispatch_workflowsafe outputs tolerant of404 No ref foundwhen the target workflow isn't present on the source ref (smoke-test path). Either skip with a warning, or auto-fallback tomain. Workflow file:pkg/workflow/safeoutputs.go(handler) +pkg/safeoutputs/dispatch_workflow.goif applicable.P2 — backlog (Cluster C)
5. Raise
max-turnsforavenger.mdto 50 or add an explicit "give up cleanly after 20 turns and callnoop" guardrail; consider exposing a budget warning the agent can see at turn 20/25.Sub-issues created
stream_options.include_usagerejected bygpt-5.5#34522 — P0 fix: revert/pin Codex CLI 0.133.0 —stream_options.include_usagerejected by gpt-5.5No sub-issues created for Clusters B and C: existing auto-issues #34520 (Avenger) and the pending smoke-test design work can carry the P2/P1 items without duplication.
Confidence & unknowns
unknown_parameter400),codex_app_server.client_version: 0.133.0in every log, and audit-diff against a Codex 0.130.0 baseline showing 6 vs 33 OpenAI requests.dispatch_workflowshould soft-fail or whether the smoke test should targetmain.max_turnsissue. Watch for repeats before tuning.stream_options.include_usagerequest payload — likely an SDK update inside Codex itself; upstream investigation needed.References