chore: sync public mirror from internal#390
Conversation
PR SummaryMedium Risk Overview Adds a new Implements deterministic scripted replay as a hidden Extends the headless protocol to include Reviewed by Cursor Bugbot for commit 692e147. Bugbot is set up for automated code reviews on this repo. Configure here. |
|
Review the following changes in direct dependencies. Learn more about Socket for GitHub.
|
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 692e147044
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| if (statement.kind === "error") { | ||
| partial.stopReason = "error"; | ||
| partial.errorMessage = statement.message; | ||
| yield { type: "error", reason: "error", error: partial }; | ||
| return; |
There was a problem hiding this comment.
Honor transient replay errors instead of fatal-stopping turns
When a scripted frame uses {"kind":"error","type":"transient"}, the stream currently emits a terminal error event and returns exactly like a fatal error. In this path, no retryable signal is produced, so replay scenarios cannot model transient provider failures and recovery behavior even though the schema explicitly distinguishes transient vs fatal. This makes any fixture that depends on transient fault semantics behave as an unrecoverable failure.
Useful? React with 👍 / 👎.
| return ( | ||
| statement.id ?? |
There was a problem hiding this comment.
Validate scripted tool_call IDs before using them as strings
toolCallId() returns statement.id directly, assuming it is a string. Because the parser only checks tool_call.tool/expectedResult and then casts the JSON to typed objects, a malformed fixture with a non-string id can pass validation and propagate a non-string call ID into emitted tool-call events and result matching. That can break downstream consumers that expect string call IDs and produce hard-to-diagnose replay mismatches.
Useful? React with 👍 / 👎.
Summary
evalops/maestro-internalevalops/maestroas a generated public mirror of the private source of truth6b10bfe020cd6a25a49092598947319887a8f48776191eda270b297accd226024bc02da1f4fcd76264file(s) to copy/update and0stale file(s) to delete0Source-of-truth status
Public Mirror Drift Audit
@evalops/maestrohttps://github.com/evalops/maestro-internal@main (6b10bfe020cd)https://github.com/evalops/maestro@main (76191eda270b)640public_projection_has_driftSample Changed Paths
Guidance
Let internal main generate and merge the public sync PR before relying on public main.
Drift sample
Public-only commits since last generated sync
Validation
sync-public-release-mirrorworkflow inpublic-treemodeTest Plan
sync-public-release-mirrorworkflow inpublic-treemoderequire-internal-prcheck confirms internal source PR lineageStaged Rollout
evalops/maestro-internal@6b10bfe020cd6a25a49092598947319887a8f487, including existing hidden/evaluation surfaces, and keeps public package parity behind the established public-source-provenance gate.