feat(studio): expose eval resumability — API + Resume action on run detail

## Objective

Surface the existing CLI resume mechanics (`--resume`, `--rerun-failed`, `--output <dir>`) in Studio so a user staring at an interrupted or partially-errored run can finish it from the web UI instead of dropping to a terminal.

Today Studio can launch a fresh eval (`POST /api/eval/run`) and renders `execution_error` per test on the run detail page, but the launch request shape doesn't carry the resume parameters and no UI affordance exists.

Follow-up to #1216 / [PR #1217](https://github.com/EntityProcess/agentv/pull/1217), which scoped TUI + flag-level UX + docs + auto-detect but explicitly deferred Studio.

## Background — current state in code

- **Launch endpoint:** `apps/cli/src/commands/results/eval-runner.ts:240` (unscoped) and `apps/cli/src/commands/results/eval-runner.ts:407` (benchmark-scoped). Both call `buildCliArgs` and `spawn` the CLI.
- **Request shape:** `RunEvalRequest` at `apps/cli/src/commands/results/eval-runner.ts:101` — has `suite_filter`, `test_ids`, `target`, `threshold`, `workers`, `dry_run`. **Missing `resume`, `rerun_failed`, `retry_errors`, `output`.**
- **CLI arg builder:** `buildCliArgs` at `apps/cli/src/commands/results/eval-runner.ts:110`.
- **UI client:** `apps/studio/src/lib/api.ts:529` (`runEval` function).
- **Run detail route:** `apps/studio/src/routes/runs/$runId.tsx` and benchmark variant `apps/studio/src/routes/benchmarks/\$benchmarkId_/runs/$runId.tsx`.
- **Run detail component:** `apps/studio/src/components/RunDetail.tsx:174` already renders `executionStatus === 'execution_error'` per row, so the data needed to decide \"is there anything to resume\" is already on the page.
- **Job polling page:** `apps/studio/src/routes/jobs/$runId.tsx` (existing — reuse for the post-resume status view).
- **Read-only guard:** the launch endpoint already rejects in read-only mode (`eval-runner.ts:241`); the new behaviour must respect this.

## Proposed changes

### 1. Extend the launch API (server)

Add to `RunEvalRequest`:

```ts
interface RunEvalRequest {
  // ...existing fields...
  resume?: boolean;
  rerun_failed?: boolean;
  retry_errors?: string;  // path to a prior run dir or index.jsonl
  output?: string;        // explicit run dir; required when resume/rerun_failed are set
                          // and the server isn't auto-detecting from cache
}
```

Wire format is **snake_case** per `AGENTS.md` (\"Wire Format Convention\"). Validation:

- `resume` and `rerun_failed` are mutually exclusive.
- `retry_errors` is mutually exclusive with `resume` / `rerun_failed`.
- When `resume` or `rerun_failed` is set without `output`, accept it — the CLI will auto-detect from `.agentv/cache.json` (landed in PR #1217).
- Return `400` with a clear error message on invalid combinations.

Extend `buildCliArgs` to translate these into `--resume`, `--rerun-failed`, `--retry-errors <path>`, `--output <dir>`.

### 2. Add UI action on the run detail page

On `/runs/:runId` (and the benchmark-scoped equivalent), when the loaded run contains at least one result with `executionStatus === 'execution_error'`:

- Render a primary button labelled **\"Resume run\"** that calls `POST /api/eval/run` with `{ suite_filter: <run's suite filter>, target: <run's target>, output: <run dir>, resume: true }`.
- Render a secondary button **\"Rerun failed cases\"** that does the same with `rerun_failed: true` instead of `resume: true`. (Same in-place semantics as the CLI flag — re-runs everything that wasn't `executionStatus === 'ok'`.)
- After POST, redirect to `/jobs/:runId` (existing route) to show progress.
- Disable both buttons in read-only mode.

UI placement: top-right of the RunDetail header is fine — keep it visible without scrolling.

### 3. Tests

- **Server tests** in `apps/cli/test/commands/results/serve.test.ts` (existing file): add cases for valid resume/rerun_failed/retry_errors requests, mutual-exclusivity rejections, and the read-only guard.
- **UI tests:** assert the button only renders when the run has at least one `execution_error` row; assert the request body shape on click; assert read-only hides/disables the buttons.

## Acceptance signals

- [ ] `RunEvalRequest` accepts `resume`, `rerun_failed`, `retry_errors`, `output` (snake_case keys).
- [ ] CLI is spawned with the corresponding flags; verified by inspecting the `command` field returned in the launch response.
- [ ] Mutual-exclusivity validation returns 400 with a usable error message.
- [ ] `/runs/:runId` shows a \"Resume run\" button when any row has `executionStatus === 'execution_error'`; clicking it triggers a launch with `resume: true` + `output: <runDir>` and redirects to `/jobs/:runId`.
- [ ] \"Rerun failed cases\" button works analogously with `rerun_failed: true`.
- [ ] Read-only mode hides or disables both buttons (button-level UX, not just the 403 from the server).
- [ ] Manual red/green UAT documented in the PR: red = launch a deliberately failing eval, observe execution_error rows, no resume button on `main`; green = same scenario, click Resume, observe new run dir reuses the same path and the previously-passing tests are skipped.

## Non-goals

- **No `/runs` list filter** for incomplete runs. Add the action where users already are (the detail page); broader filters can be a separate, smaller issue if usage warrants.
- **No new resume verbs.** Surface the three existing CLI flags; don't invent a fourth.
- **No `--retry-errors <path>` UI picker.** The path-based variant is for cross-run cases; in-Studio resume targets the run currently being viewed, so `output: <currentRunDir>` is sufficient.
- **No scheduled / auto-resume.** Manual button click only.
- **No changes to the run-launch wizard / form** for *new* runs — this issue is about resuming existing runs.

## Related

- Parent issue: #1216
- Implementing PR: #1217 (merged) — CLI/wizard/docs/auto-detect
- Wire format conventions: `AGENTS.md` → \"Wire Format Convention\"
- Issue workflow: `AGENTS.md` → \"Issue Workflow\" (claim on the project board before starting)

## Estimate

~1 day. Server change is mechanical (one interface, one arg builder, validation, tests). UI change is one button + one route handler + tests. No design work needed — peers (promptfoo cloud) put resume actions on run detail pages too.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(studio): expose eval resumability — API + Resume action on run detail #1219

Objective

Background — current state in code

Proposed changes

1. Extend the launch API (server)

2. Add UI action on the run detail page

3. Tests

Acceptance signals

Non-goals

Related

Estimate

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

feat(studio): expose eval resumability — API + Resume action on run detail #1219

Description

Objective

Background — current state in code

Proposed changes

1. Extend the launch API (server)

2. Add UI action on the run detail page

3. Tests

Acceptance signals

Non-goals

Related

Estimate

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions