[evals] Onboarding flow#2103
Conversation
|
There was a problem hiding this comment.
4 issues found across 14 files
Confidence score: 3/5
- There is moderate merge risk because
packages/evals/tui/commands/doctor.tssurfaces raw exception messages to users/JSON output (severity 7/10, high confidence), which can expose unsanitized internal error details. packages/evals/tui/welcomeStatus.tshas a logic/comment mismatch (||vs the stated “all present values are alias-only” invariant), creating a concrete behavior regression risk in status reporting.packages/evals/tui/repl.tsandpackages/evals/tui/commands/doctor.tseach have user-facing consistency issues (first-run onboarding being permanently suppressed in quiet mode, and env snapshot vs probe disagreement forOPENAI_API_KEY).- Pay close attention to
packages/evals/tui/commands/doctor.ts,packages/evals/tui/welcomeStatus.ts, andpackages/evals/tui/repl.ts- sanitize surfaced errors and align boolean/env/onboarding logic with intended behavior.
Prompt for AI agents (unresolved issues)
Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.
<file name="packages/evals/tui/commands/doctor.ts">
<violation number="1" location="packages/evals/tui/commands/doctor.ts:163">
P1: Custom agent: **Exception and error message sanitization**
Raw exception messages are surfaced to users/JSON output without sanitization.</violation>
<violation number="2" location="packages/evals/tui/commands/doctor.ts:395">
P2: The probe reads `process.env.OPENAI_API_KEY` directly, but `snapshotEnv()` also checks the package `.env` file. If the key exists only in `packages/evals/.env`, the doctor will show it as "✓ set" while the probe fails with an auth error because it never sees the actual value.</violation>
</file>
<file name="packages/evals/tui/repl.ts">
<violation number="1" location="packages/evals/tui/repl.ts:103">
P2: `markFirstRunComplete(entryDir)` is called even in `quiet` mode, permanently suppressing the onboarding welcome for future interactive sessions despite the user never having seen it. Consider moving this call inside the `if (!quiet)` block so the marker is only set when the user actually had a chance to see (or dismiss) the welcome.</violation>
</file>
<file name="packages/evals/tui/welcomeStatus.ts">
<violation number="1" location="packages/evals/tui/welcomeStatus.ts:130">
P2: Logic does not match its stated invariant. Using `||` makes `viaAlias` true when *any* present value is alias-only, not when *all* present values are alias-only as the comment specifies. If the intent is "all present BB values come from aliases", replace `||` with a conjunction that checks each present value independently.</violation>
</file>
Architecture diagram
sequenceDiagram
participant CLI as CLI entry
participant Doctor as Doctor Command
participant Welcome as Welcome State
participant REPL as REPL
participant Config as Config RW
participant Env as Env Snapshot
participant Discovery as Task Discovery
participant Build as Build Script
Note over CLI,Build: NEW: REPL first-run onboarding + doctor health command
CLI->>CLI: parse args (--quiet/-q, EVALS_NO_WELCOME)
alt REPL launch (no args or only --quiet/-q flags)
CLI->>REPL: startRepl(entryDir, { quiet })
alt --quiet or EVALS_NO_WELCOME=1
REPL->>REPL: skip all chrome, show prompt directly
else first run
REPL->>Welcome: isFirstRun(entryDir)
Welcome->>Config: readConfig(entryDir) → read _meta
Config-->>Welcome: _meta (or empty)
Welcome-->>REPL: true (no firstRunCompletedAt)
REPL->>Env: snapshotEnv() → check provider+BB+braintrust keys
REPL->>Discovery: discoverTasks(tasksRoot)
Discovery-->>REPL: registry with task count
REPL->>REPL: printExtendedWelcome(health, registry)
else returning user (not first run)
REPL->>Env: snapshotEnv()
alt zero provider keys
REPL->>REPL: renderInlineWarning() → yellow warning line
end
REPL->>REPL: printTipLine()
end
REPL->>Welcome: markFirstRunComplete(entryDir) → write _meta
Welcome->>Config: readConfig + set _meta.firstRunCompletedAt
Config-->>Welcome: config saved
REPL-->>CLI: REPL loop started
else doctor/health command
CLI->>Doctor: handleDoctor(subArgs, entryDir)
Doctor->>Config: readConfig → resolveConfigPath
Doctor->>Env: snapshotEnv() → full key matrix
Doctor->>Discovery: discoverTasks(tasksRoot)
Doctor->>Doctor: readStagehandVersion()
Doctor->>Doctor: computeVerdict(keys, config, discovery)
alt verdict = fail
Doctor->>Doctor: reasons = zero providers or missing BB for env=browserbase
else verdict = warn
Doctor->>Doctor: reasons = partial BB or no braintrust
else verdict = ok
Doctor->>Doctor: reasons = []
end
alt --json flag
Doctor->>Doctor: print JSON report (always exit 0)
else human output
Doctor->>Doctor: print formatted report
alt verdict = fail
Doctor-->CLI: exit 1
else
Doctor-->CLI: exit 0
end
end
end
Note over CLI: First-run marker only written after real commands (not help/doctor)
alt command was 'run', 'list', 'config' (non-help), 'new', or unknown target
CLI->>CLI: shouldMarkFirstRun = true
CLI->>CLI: execute command
CLI->>Welcome: markFirstRunComplete(entryDir) [in finally block]
else command was help invocation or doctor
CLI->>CLI: shouldMarkFirstRun = false → skip marker
end
Note over Build: Build script preserves _meta across rebuilds
Build->>Config: read dist evals.config.json
alt existing._meta present
Build->>Build: merge dist._meta into source config
Build->>Config: write merged config (preserves first-run state)
end
Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review, or fix all with cubic.
| @@ -0,0 +1,443 @@ | |||
| /** | |||
There was a problem hiding this comment.
P1: Custom agent: Exception and error message sanitization
Raw exception messages are surfaced to users/JSON output without sanitization.
Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At packages/evals/tui/commands/doctor.ts, line 163:
<comment>Raw exception messages are surfaced to users/JSON output without sanitization.</comment>
<file context>
@@ -0,0 +1,443 @@
+ total: 0,
+ core: 0,
+ bench: 0,
+ error: (err as Error).message,
+ root,
+ };
</file context>
why
what changed
test plan
Summary by cubic
Adds a first-run onboarding flow to the
evalsREPL and a newevals doctor/healthcommand for environment and config health. Introduces quiet mode and persists a_metafirst-run marker inevals.config.jsonso the welcome shows once.New Features
--quiet/-qorEVALS_NO_WELCOME=1. Only prints an inline warning when zero provider keys are found; otherwise shows a compact tip line. Banner shows art only._meta.firstRunCompletedAttoevals.config.json; preserved across CLI rebuilds. REPL marks completion pre-prompt unless--quiet; argv marks only after real commands (not help ordoctor, including nested help underconfig/experiments).evals doctor/health: human output and--json; verdictsok|warn|failwith exit codes0|0|1; hidden--probeto sanity-check an OpenAI key. Report covers runtime (Node, Stagehand, mode), config path/defaults, task discovery, and a key matrix (OpenAI/Anthropic/Google/Browserbase/Braintrust) with source provenance and BB alias hints. Listed in help and available in the REPL.Migration
evals doctorto verify setup. SetOPENAI_API_KEY/ANTHROPIC_API_KEY/GOOGLE_GENERATIVE_AI_API_KEY(orGEMINI_API_KEY) andBRAINTRUST_API_KEY; forenv=browserbase, also setBROWSERBASE_API_KEYandBROWSERBASE_PROJECT_ID(orBB_*).--quietor setEVALS_NO_WELCOME=1.Written for commit 2402b4a. Summary will update on new commits.