feat(studio): add Stop run button + graceful CLI interrupt — pairs with eval resume

## Objective

Add a "Stop run" affordance — both a UI button on `/jobs/:runId` and graceful CLI signal handling — so users can interrupt a long-running eval without orphaning the subprocess or losing partial results. Today there is no programmatic stop:

- **Studio**: the launch endpoint stores `process: ChildProcess` per run but exposes no DELETE/stop route. Closing the browser tab leaves the CLI subprocess running until it completes naturally.
- **CLI**: no top-level SIGINT/SIGTERM handler. `Ctrl+C` hard-kills the eval mid-test. The only `child.kill()` calls in the codebase live inside agent providers (claude-cli, codex-cli, pi-cli) terminating their own per-test subprocess on timeout — not the orchestrator handling user interrupt.

This pairs naturally with the resume feature shipped in #1220: today the workflow for "I want to bail on this run" is *kill the terminal → resume in Studio*. With a Stop button it becomes *click Stop → click Resume*, all without leaving the browser.

## Current state — what already works

- Per-test results are flushed row-by-row into `index.jsonl` as tests complete, so any partial state is durable on disk and resumable. The "stop" feature does not need to invent persistence — only graceful termination.
- `eval-runner.ts` already retains a `process: ChildProcess` reference per Studio-launched run, so the server can `process.kill('SIGTERM')` once an endpoint is added.

## Gap discovered: incomplete partial runs are not surfaced as resumable

The current Studio Resume affordance from #1220 is keyed to runs that contain at least one `execution_status: execution_error` row.

That misses an important resumability case: a run can be interrupted after writing some successful rows and before executing the remaining planned tests, leaving a partial run that is still resumable in principle but has no `execution_error` rows.

Real repro:

1. Start a multi-test eval.
2. Let a few tests complete successfully.
3. Kill the run before the remaining tests execute.
4. Open the run detail page in Studio.

Observed:

- The run contains only the completed `ok` rows.
- There are no `execution_error` rows.
- Studio shows only "Re-run with Filters".
- No "Resume run" button is rendered.

Expected:

- A partial run should be resumable when it is incomplete relative to the originally planned suite/test set, even if all recorded rows are currently `ok`.

This matters because resumability should support "continue later" workflows, not only "recover from execution errors."

## Proposed changes

### 1. CLI signal handler

Register `SIGINT` / `SIGTERM` handlers at the top of `apps/cli/src/commands/eval/run-eval.ts` (or wherever the orchestrator entry point lives):

- On first signal: set a `stopRequested` flag, allow in-flight tests to finish (they're already isolated), then exit cleanly with a non-zero code distinguishable from "crashed."
- On second signal: hard exit (so users can still escape if a test is hung).
- Print a concise message: `Stop requested — waiting for N in-flight test(s) to finish (Ctrl+C again to force-quit).`

### 2. Studio API: `DELETE /api/eval/run/:id`

Add a route that:

- 404s if the run id is unknown.
- 403s in read-only mode (matches the existing guard on POST).
- 409s (or 200 with `{stopped: false}`) if the run is already terminal.
- Otherwise calls `run.process?.kill('SIGTERM')`, sets `run.status = 'stopping'`, returns `202`.

The existing `child.on('close')` handler will flip the status to `failed`/`finished` when the CLI exits.

Add benchmark-scoped variant `DELETE /api/benchmarks/:benchmarkId/eval/run/:id` matching the existing pattern.

### 3. UI: "Stop run" button on `/jobs/:runId`

In `apps/studio/src/routes/jobs/$runId.tsx`:

- Render a destructive-style button (red outline) when `status === 'starting'` or `'running'` and not in read-only mode.
- On click: `DELETE /api/eval/run/:id`, optimistic-flip the status indicator to "Stopping…".
- After the run hits a terminal state, the existing UI already updates correctly.
- Disable in read-only mode (UI-level, the API also 403s).

### 4. Resume metadata for incomplete partial runs

Tighten the run-detail resumability contract so the UI does not infer resumability solely from `execution_error` rows.

Possible shape:

- compute resumability from run completeness relative to the planned suite/test set recorded in `benchmark.json` / launch metadata
- surface explicit fields like `is_resumable` and `resume_reason` from the run-detail API
- continue to support the existing execution-error case, but also treat truncated partial runs as resumable

### 5. Tests

- **Server**: in `apps/cli/test/commands/results/serve.test.ts`, add cases for unknown id (404), read-only (403), and a happy-path stop using a fake long-running child.
- **CLI**: a small test that sends SIGINT to a multi-test eval run and asserts (a) exit code is the "stopped" sentinel and (b) `index.jsonl` contains the rows for tests completed before the signal.
- **UI**: pure helper for "should the stop button render?" — `shouldShowStopButton(status, isReadOnly)`.
- **Resume UI**: tests covering both resumable states: `execution_error` rows and incomplete partial runs with only `ok` rows.

## Acceptance signals

- [ ] CLI: SIGINT during a multi-test eval produces a clean exit and a partial `index.jsonl` containing all tests completed before the signal.
- [ ] CLI: a second SIGINT within 1s force-quits.
- [ ] Server: `DELETE /api/eval/run/:id` exists and is 403-guarded in read-only mode; benchmark-scoped variant works the same.
- [ ] UI: a "Stop run" button renders on `/jobs/:runId` while running, hidden when terminal, hidden in read-only.
- [ ] UI: clicking Stop, then navigating to the originating `/runs/:runId`, shows the partial run and the **Resume run** button from #1220 visible when either condition is true:
  - the run contains at least one `execution_status: execution_error` row, or
  - the run is incomplete relative to the originally planned suite/test set even though all recorded rows are `ok`.
- [ ] UI: a fully completed successful run does **not** show Resume.
- [ ] Manual red/green: red = on `main`, killing terminal mid-eval is the only way to stop; green = on this branch, the Stop button on `/jobs/<id>` terminates cleanly and the partial run is resumable in one click.

## Non-goals

- **No "Pause" semantics.** Stop fully terminates; resume is the way to continue.
- **No queue management.** This is for one running job at a time — multi-job orchestration is out of scope.
- **No SIGINT-to-grader translation.** If a grader is mid-flight when the signal arrives, let it finish or time out per existing rules.

## Related

- Parent feature (resume): #1219
- Implementing PR (resume): #1220
- Existing per-provider kill patterns to mirror style: `packages/core/src/evaluation/providers/{claude-cli,codex-cli,pi-cli}.ts`

## Estimate

~half a day. CLI signal handling is the biggest unknown (need to thread the flag through the worker pool); the UI + API changes are small.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(studio): add Stop run button + graceful CLI interrupt — pairs with eval resume #1222

Objective

Current state — what already works

Gap discovered: incomplete partial runs are not surfaced as resumable

Proposed changes

1. CLI signal handler

2. Studio API: `DELETE /api/eval/run/:id`

3. UI: "Stop run" button on `/jobs/:runId`

4. Resume metadata for incomplete partial runs

5. Tests

Acceptance signals

Non-goals

Related

Estimate

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

feat(studio): add Stop run button + graceful CLI interrupt — pairs with eval resume #1222

Description

Objective

Current state — what already works

Gap discovered: incomplete partial runs are not surfaced as resumable

Proposed changes

1. CLI signal handler

2. Studio API: DELETE /api/eval/run/:id

3. UI: "Stop run" button on /jobs/:runId

4. Resume metadata for incomplete partial runs

5. Tests

Acceptance signals

Non-goals

Related

Estimate

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

2. Studio API: `DELETE /api/eval/run/:id`

3. UI: "Stop run" button on `/jobs/:runId`