Conversation
This comment was marked as resolved.
This comment was marked as resolved.
Co-authored-by: cubic-dev-ai[bot] <191113872+cubic-dev-ai[bot]@users.noreply.github.com>
This comment was marked as resolved.
This comment was marked as resolved.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
https://www.braintrust.dev/app/Browserbase/p/stagehand-dev/experiments/observe%2Fobserve_simple_google_search-2a092abe?c=observe/observe_simple_google_search-cd76e3b1&r=4f104beb-fee2-42c0-9f9c-17fd2bc240f3&s=5d806aa7-7e45-4cc2-ad91-0965e396efdb
Validation
Summary by cubic
Adds a
stagehand_v4eval harness that runs the v3 agent loop against the v4 SDK via a native tool bridge, with a v4-backed page facade and assertions. Also renamesunderstudy_codetounderstudy_v3_code, setsstagehand_v3as the default harness, and enables harness‑native bench implementations.New Features
StagehandAgentV4HarnesswithUnderstudyV4Tools: derives tool catalog from the v4 SDK, lazy-loads the SDK, blocks--api, exposesctx.v4, prints v4bus.logTree()on verbose cleanup, installs a v4-backed page facade (goto/evaluate/waitForLoadState/locator) with fixed load-state and eval-target tracking, adapts flattened action params, returns page text whenextracthas no schema, and uses v4 element info for locator assertions.StagehandAgentV3Harness,ClaudeAgentHarness, andCodexAgentHarness; runner now selects harness-native implementations viadefineBenchTask(...benchFns).stagehand_v3; default core tool isunderstudy_v3_code; help/tests updated; CLI enforces that onlystagehand_v3supports--api.Migration
--harness stagehand_v3(default) to keep current behavior, or--harness stagehand_v4(no--api) to use the v4 SDK.understudy_codewithunderstudy_v3_code. CLI defaults, help, and tests reflect the new name.stagehand_v4requires a local v4 SDK; setSTAGEHAND_V4_SDK_PATHor rely on the default path. The CLI validates presence before running.Written for commit 8b9cf5b. Summary will update on new commits.