Skip to content

chore: sync public mirror from internal#390

Open
haasonsaas wants to merge 1 commit into
mainfrom
sync/public-release-mirror
Open

chore: sync public mirror from internal#390
haasonsaas wants to merge 1 commit into
mainfrom
sync/public-release-mirror

Conversation

@haasonsaas
Copy link
Copy Markdown
Contributor

Summary

  • sync the sanitized public tree from evalops/maestro-internal
  • keep evalops/maestro as a generated public mirror of the private source of truth
  • preserve public-owned CI and trusted-publishing workflows from the public checkout
  • internal source SHA: 6b10bfe020cd6a25a49092598947319887a8f487
  • last generated public sync base: 76191eda270b297accd226024bc02da1f4fcd762
  • previewed public-tree drift: 64 file(s) to copy/update and 0 stale file(s) to delete
  • public-only commits since last generated sync: 0

Source-of-truth status

Public Mirror Drift Audit

  • package: @evalops/maestro
  • private source: https://github.com/evalops/maestro-internal@main (6b10bfe020cd)
  • public projection: https://github.com/evalops/maestro@main (76191eda270b)
  • files to copy or update: 64
  • stale files to delete: 0
  • result: drift detected
  • invariant: public_projection_has_drift

Sample Changed Paths

  • copy/update docs/protocols/agent-trajectory-scenarios.md
  • copy/update docs/protocols/agent-trajectory.md
  • copy/update docs/protocols/headless.md
  • copy/update package.json
  • copy/update packages/contracts/schema/headless/payload-schemas.json
  • copy/update packages/contracts/schema/headless/protocol.json
  • copy/update packages/contracts/src/headless-protocol-generated.ts
  • copy/update packages/contracts/src/headless-protocol-payloads.manifest.json
  • copy/update packages/contracts/src/headless-protocol-schemas.generated.ts
  • copy/update packages/contracts/src/index.ts
  • copy/update packages/contracts/src/proto/maestro/v1/headless_pb.ts
  • copy/update packages/contracts/src/scenario.ts
  • copy/update packages/tui-rs/src/runtime_badges.rs
  • copy/update proto/maestro/v1/headless.proto
  • copy/update scripts/check-agent-trajectory-scenario-fixtures.ts
  • copy/update scripts/check-public-surface-boundary.mjs
  • copy/update scripts/check-scripted-scenario-fixtures.ts
  • copy/update scripts/headless-protocol-codegen.mjs
  • copy/update scripts/verify-headless-proto-sync.mjs
  • copy/update src/agent/modes.ts
  • copy/update src/agent/providers/scripted.ts
  • copy/update src/agent/transport.ts
  • copy/update src/agent/transport/create-provider-stream.ts
  • copy/update src/agent/types.ts
  • copy/update src/bootstrap/event-subscriptions-setup.ts
  • ... 39 more

Guidance

Let internal main generate and merge the public sync PR before relying on public main.

Drift sample

  • copy/update docs/protocols/agent-trajectory-scenarios.md
  • copy/update docs/protocols/agent-trajectory.md
  • copy/update docs/protocols/headless.md
  • copy/update package.json
  • copy/update packages/contracts/schema/headless/payload-schemas.json
  • copy/update packages/contracts/schema/headless/protocol.json
  • copy/update packages/contracts/src/headless-protocol-generated.ts
  • copy/update packages/contracts/src/headless-protocol-payloads.manifest.json
  • copy/update packages/contracts/src/headless-protocol-schemas.generated.ts
  • copy/update packages/contracts/src/index.ts
  • copy/update packages/contracts/src/proto/maestro/v1/headless_pb.ts
  • copy/update packages/contracts/src/scenario.ts
  • copy/update packages/tui-rs/src/runtime_badges.rs
  • copy/update proto/maestro/v1/headless.proto
  • copy/update scripts/check-agent-trajectory-scenario-fixtures.ts
  • copy/update scripts/check-public-surface-boundary.mjs
  • copy/update scripts/check-scripted-scenario-fixtures.ts
  • copy/update scripts/headless-protocol-codegen.mjs
  • copy/update scripts/verify-headless-proto-sync.mjs
  • copy/update src/agent/modes.ts

Public-only commits since last generated sync

  • none detected since last generated sync

Validation

  • generated by the sync-public-release-mirror workflow in public-tree mode

Test Plan

  • generated by the sync-public-release-mirror workflow in public-tree mode
  • public-source-provenance require-internal-pr check confirms internal source PR lineage
  • CI, integration, rust-hosted-conformance, coverage, Socket, and Cursor checks must pass before merge

Staged Rollout

  • Staging is unnecessary for this generated mirror PR: it does not independently promote user-visible behavior. It mirrors already-reviewed internal source from evalops/maestro-internal@6b10bfe020cd6a25a49092598947319887a8f487, including existing hidden/evaluation surfaces, and keeps public package parity behind the established public-source-provenance gate.

@cursor
Copy link
Copy Markdown

cursor Bot commented May 10, 2026

PR Summary

Medium Risk
Adds new scenario evaluation and deterministic replay/recording paths plus extends the headless wire protocol (executor_type) and session metadata; integration touches CLI, runtime transport, and generated contract code, so regressions would mainly show up in tooling/protocol consumers.

Overview
Introduces a scenario acceptance harness for agent-trajectory artifacts and a parallel scripted replay scenario format, including new contracts (packages/contracts/src/scenario.ts) and docs describing required fields, labels, assertions, and CI promotion.

Adds a new maestro scenario validate|run CLI (with optional --junit/--json) to validate/run both offline trajectory scenarios and scripted scenarios, plus CI fixture checkers that lock result/JUnit outputs (check:agent-trajectory-scenario-fixtures, check:scripted-scenario-fixtures).

Implements deterministic scripted replay as a hidden replay agent mode and scripted-replay/maestro-replay-v1 model/provider, including runtime routing, credential bypass, session tagging (scenario_replay), optional recording to a scripted scenario file (--record-scenario), and UI/runtime badges indicating replay.

Extends the headless protocol to include executor_type (live vs replay) on ready (and connection state), updating the proto, generated schemas, and docs so clients can badge replay sessions without inferring from model names.

Reviewed by Cursor Bugbot for commit 692e147. Bugbot is set up for automated code reviews on this repo. Configure here.

@socket-security
Copy link
Copy Markdown

Review the following changes in direct dependencies. Learn more about Socket for GitHub.

Diff Package Supply Chain
Security
Vulnerability Quality Maintenance License
Addedgoogle-auth-library@​9.15.19610010088100
Added@​slack/​socket-mode@​2.0.79910010092100
Added@​daytonaio/​sdk@​0.139.09310010099100
Added@​slack/​web-api@​7.15.29910010098100

View full report

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 692e147044

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment on lines +496 to +500
if (statement.kind === "error") {
partial.stopReason = "error";
partial.errorMessage = statement.message;
yield { type: "error", reason: "error", error: partial };
return;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Honor transient replay errors instead of fatal-stopping turns

When a scripted frame uses {"kind":"error","type":"transient"}, the stream currently emits a terminal error event and returns exactly like a fatal error. In this path, no retryable signal is produced, so replay scenarios cannot model transient provider failures and recovery behavior even though the schema explicitly distinguishes transient vs fatal. This makes any fixture that depends on transient fault semantics behave as an unrecoverable failure.

Useful? React with 👍 / 👎.

Comment on lines +338 to +339
return (
statement.id ??
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Validate scripted tool_call IDs before using them as strings

toolCallId() returns statement.id directly, assuming it is a string. Because the parser only checks tool_call.tool/expectedResult and then casts the JSON to typed objects, a malformed fixture with a non-string id can pass validation and propagate a non-string call ID into emitted tool-call events and result matching. That can break downstream consumers that expect string call IDs and produce hard-to-diagnose replay mismatches.

Useful? React with 👍 / 👎.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant