[evals] Onboarding flow by miguelg719 · Pull Request #2103 · browserbase/stagehand

miguelg719 · 2026-05-11T05:20:06Z

why

what changed

test plan

Summary by cubic

Adds a first-run onboarding flow to the evals REPL and a new evals doctor/health command for environment and config health. Introduces quiet mode and persists a _meta first-run marker in evals.config.json so the welcome shows once.

New Features
- REPL onboarding: one-time extended welcome; suppress with --quiet/-q or EVALS_NO_WELCOME=1. Only prints an inline warning when zero provider keys are found; otherwise shows a compact tip line. Banner shows art only.
- First-run state: writes _meta.firstRunCompletedAt to evals.config.json; preserved across CLI rebuilds. REPL marks completion pre-prompt unless --quiet; argv marks only after real commands (not help or doctor, including nested help under config/experiments).
- evals doctor/health: human output and --json; verdicts ok|warn|fail with exit codes 0|0|1; hidden --probe to sanity-check an OpenAI key. Report covers runtime (Node, Stagehand, mode), config path/defaults, task discovery, and a key matrix (OpenAI/Anthropic/Google/Browserbase/Braintrust) with source provenance and BB alias hints. Listed in help and available in the REPL.
Migration
- Run evals doctor to verify setup. Set OPENAI_API_KEY/ANTHROPIC_API_KEY/GOOGLE_GENERATIVE_AI_API_KEY (or GEMINI_API_KEY) and BRAINTRUST_API_KEY; for env=browserbase, also set BROWSERBASE_API_KEY and BROWSERBASE_PROJECT_ID (or BB_*).
- To skip onboarding on launch, use --quiet or set EVALS_NO_WELCOME=1.

^{Written for commit 2402b4a. Summary will update on new commits.}

changeset-bot · 2026-05-11T05:20:10Z

⚠️ No Changeset found

Latest commit: 2402b4a

Merging this PR will not cause a version bump for any packages. If these changes should not result in a new version, you're good to go. If these changes should result in a version bump, you need to add a changeset.

This PR includes no changesets

When changesets are added to this PR, you'll see the packages that this PR includes changesets for and the associated semver types

Click here to learn what changesets are, and how to add one.

Click here if you're a maintainer who wants to add a changeset to this PR

cubic-dev-ai

4 issues found across 14 files

Confidence score: 3/5

There is moderate merge risk because packages/evals/tui/commands/doctor.ts surfaces raw exception messages to users/JSON output (severity 7/10, high confidence), which can expose unsanitized internal error details.
packages/evals/tui/welcomeStatus.ts has a logic/comment mismatch (|| vs the stated “all present values are alias-only” invariant), creating a concrete behavior regression risk in status reporting.
packages/evals/tui/repl.ts and packages/evals/tui/commands/doctor.ts each have user-facing consistency issues (first-run onboarding being permanently suppressed in quiet mode, and env snapshot vs probe disagreement for OPENAI_API_KEY).
Pay close attention to packages/evals/tui/commands/doctor.ts, packages/evals/tui/welcomeStatus.ts, and packages/evals/tui/repl.ts - sanitize surfaced errors and align boolean/env/onboarding logic with intended behavior.

Prompt for AI agents (unresolved issues)


Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name="packages/evals/tui/commands/doctor.ts">

<violation number="1" location="packages/evals/tui/commands/doctor.ts:163">
P1: Custom agent: **Exception and error message sanitization**

Raw exception messages are surfaced to users/JSON output without sanitization.</violation>

<violation number="2" location="packages/evals/tui/commands/doctor.ts:395">
P2: The probe reads `process.env.OPENAI_API_KEY` directly, but `snapshotEnv()` also checks the package `.env` file. If the key exists only in `packages/evals/.env`, the doctor will show it as "✓ set" while the probe fails with an auth error because it never sees the actual value.</violation>
</file>

<file name="packages/evals/tui/repl.ts">

<violation number="1" location="packages/evals/tui/repl.ts:103">
P2: `markFirstRunComplete(entryDir)` is called even in `quiet` mode, permanently suppressing the onboarding welcome for future interactive sessions despite the user never having seen it. Consider moving this call inside the `if (!quiet)` block so the marker is only set when the user actually had a chance to see (or dismiss) the welcome.</violation>
</file>

<file name="packages/evals/tui/welcomeStatus.ts">

<violation number="1" location="packages/evals/tui/welcomeStatus.ts:130">
P2: Logic does not match its stated invariant. Using `||` makes `viaAlias` true when *any* present value is alias-only, not when *all* present values are alias-only as the comment specifies. If the intent is "all present BB values come from aliases", replace `||` with a conjunction that checks each present value independently.</violation>
</file>

Architecture diagram

sequenceDiagram
    participant CLI as CLI entry
    participant Doctor as Doctor Command
    participant Welcome as Welcome State
    participant REPL as REPL
    participant Config as Config RW
    participant Env as Env Snapshot
    participant Discovery as Task Discovery
    participant Build as Build Script

    Note over CLI,Build: NEW: REPL first-run onboarding + doctor health command

    CLI->>CLI: parse args (--quiet/-q, EVALS_NO_WELCOME)

    alt REPL launch (no args or only --quiet/-q flags)
        CLI->>REPL: startRepl(entryDir, { quiet })
        alt --quiet or EVALS_NO_WELCOME=1
            REPL->>REPL: skip all chrome, show prompt directly
        else first run
            REPL->>Welcome: isFirstRun(entryDir)
            Welcome->>Config: readConfig(entryDir) → read _meta
            Config-->>Welcome: _meta (or empty)
            Welcome-->>REPL: true (no firstRunCompletedAt)
            REPL->>Env: snapshotEnv() → check provider+BB+braintrust keys
            REPL->>Discovery: discoverTasks(tasksRoot)
            Discovery-->>REPL: registry with task count
            REPL->>REPL: printExtendedWelcome(health, registry)
        else returning user (not first run)
            REPL->>Env: snapshotEnv()
            alt zero provider keys
                REPL->>REPL: renderInlineWarning() → yellow warning line
            end
            REPL->>REPL: printTipLine()
        end
        REPL->>Welcome: markFirstRunComplete(entryDir) → write _meta
        Welcome->>Config: readConfig + set _meta.firstRunCompletedAt
        Config-->>Welcome: config saved
        REPL-->>CLI: REPL loop started
    else doctor/health command
        CLI->>Doctor: handleDoctor(subArgs, entryDir)
        Doctor->>Config: readConfig → resolveConfigPath
        Doctor->>Env: snapshotEnv() → full key matrix
        Doctor->>Discovery: discoverTasks(tasksRoot)
        Doctor->>Doctor: readStagehandVersion()
        Doctor->>Doctor: computeVerdict(keys, config, discovery)
        alt verdict = fail
            Doctor->>Doctor: reasons = zero providers or missing BB for env=browserbase
        else verdict = warn
            Doctor->>Doctor: reasons = partial BB or no braintrust
        else verdict = ok
            Doctor->>Doctor: reasons = []
        end
        alt --json flag
            Doctor->>Doctor: print JSON report (always exit 0)
        else human output
            Doctor->>Doctor: print formatted report
            alt verdict = fail
                Doctor-->CLI: exit 1
            else
                Doctor-->CLI: exit 0
            end
        end
    end

    Note over CLI: First-run marker only written after real commands (not help/doctor)

    alt command was 'run', 'list', 'config' (non-help), 'new', or unknown target
        CLI->>CLI: shouldMarkFirstRun = true
        CLI->>CLI: execute command
        CLI->>Welcome: markFirstRunComplete(entryDir) [in finally block]
    else command was help invocation or doctor
        CLI->>CLI: shouldMarkFirstRun = false → skip marker
    end

    Note over Build: Build script preserves _meta across rebuilds

    Build->>Config: read dist evals.config.json
    alt existing._meta present
        Build->>Build: merge dist._meta into source config
        Build->>Config: write merged config (preserves first-run state)
    end

_{Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review, or fix all with cubic.}

cubic-dev-ai · 2026-05-11T05:23:49Z

@@ -0,0 +1,443 @@
+/**


P1: Custom agent: Exception and error message sanitization

Raw exception messages are surfaced to users/JSON output without sanitization.

Prompt for AI agents

Check if this issue is valid — if so, understand the root cause and fix it. At packages/evals/tui/commands/doctor.ts, line 163: <comment>Raw exception messages are surfaced to users/JSON output without sanitization.</comment> <file context> @@ -0,0 +1,443 @@ + total: 0, + core: 0, + bench: 0, + error: (err as Error).message, + root, + }; </file context>

[evals] Onboarding flow

94bd71e

miguelg719 marked this pull request as ready for review May 11, 2026 05:20

cubic-dev-ai Bot reviewed May 11, 2026

View reviewed changes

address comments

2402b4a

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[evals] Onboarding flow#2103

[evals] Onboarding flow#2103
miguelg719 wants to merge 2 commits intomainfrom
miguelgonzalez/stg-1901-evals-onboarding-and-readme

miguelg719 commented May 11, 2026 •

edited by cubic-dev-ai Bot

Loading

Uh oh!

changeset-bot Bot commented May 11, 2026 •

edited

Loading

Uh oh!

cubic-dev-ai Bot left a comment

Uh oh!

cubic-dev-ai Bot May 11, 2026 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

miguelg719 commented May 11, 2026 • edited by cubic-dev-ai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

why

what changed

test plan

Summary by cubic

Uh oh!

changeset-bot Bot commented May 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

⚠️ No Changeset found

Uh oh!

cubic-dev-ai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

cubic-dev-ai Bot May 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

miguelg719 commented May 11, 2026 •

edited by cubic-dev-ai Bot

Loading

changeset-bot Bot commented May 11, 2026 •

edited

Loading

cubic-dev-ai Bot May 11, 2026 •

edited

Loading