Skip to content
Merged
79 changes: 46 additions & 33 deletions .cursor/skills/proof/SKILL.md
Original file line number Diff line number Diff line change
Expand Up @@ -71,7 +71,7 @@ The runner executes tasks within a rank **concurrently** via `Promise.all`. A li

Quality bar: when you sketch the rank structure (rank 1 → rank 2 → …), at least one rank should contain more than one task in any non-trivial problem. If your DAG is a single chain of 1-task ranks, you almost certainly missed parallelism — go back and look again.

The example shipped with the runner (`examples/example_dag.json`) demonstrates the pattern: rank 1 fans out to two read-only research tasks, rank 2 merges them into a design, rank 3 implements, and rank 4 fans out again to tests + docs.
The example shipped with the skill (`.cursor/skills/proof/examples/example_dag.json`) demonstrates the pattern: rank 1 fans out to two read-only research tasks, rank 2 merges them into a design, rank 3 implements, and rank 4 fans out again to tests + docs.

Write the JSON to a temp file **and immediately generate the initial canvas** so the user can open it while subagents spin up. Run all of the following in a single shell block:

Expand Down Expand Up @@ -105,7 +105,7 @@ The canvas path is:
~/.cursor/projects/<workspace-slug>/canvases/dag-<slug>.canvas.tsx
```

`<workspace-slug>` is derived from the cwd's absolute path with `/` and other special chars replaced by `-`. To compute it, take `pwd`, strip the leading `/`, and replace each remaining `/` with `-`. Example: cwd `/Users/me/Code/myapp` → slug `Users-me-Code-myapp`. Use the same `<slug>` you used for the DAG JSON filename so they're easy to correlate.
`<workspace-slug>` is derived from the cwd's absolute path by stripping the leading `/`, replacing path separators with `-`, and sanitizing other non-alphanumeric characters within each path segment to `-`. Example: cwd `/Users/me/Code/myapp` → slug `Users-me-Code-myapp`. Use the same `<slug>` you used for the DAG JSON filename so they're easy to correlate.

### Step 2 — Surface the canvas link in chat

Expand Down Expand Up @@ -146,20 +146,26 @@ Same `--canvas-path` as Step 1. The runner:
3. Automatically skips tasks whose upstream dependencies failed (marks them `ERROR` with a "Skipped: upstream task(s) … failed" message).
4. Captures each subagent's final assistant text, status, token usage, and duration.
5. Writes a final canvas with summary stats.
6. On SIGINT/SIGTERM/SIGHUP, cancels all in-flight subagents before finalizing the canvas.
6. Artifact output (default, suppress with `--no-artifacts` or override path with `--full-output-dir`; skipped entirely for `--init-only` and `--dry-check-cmds`):
- **At run start:** writes `_dag.json` (the original DAG definition) to the artifacts directory.
- **As each task finishes:** writes `${taskId}.md` (full transcript for `kind: task`, `oracle`, and `pause`).
- **At run end:** best-effort `_index.md` (run summary table with timestamps, outcome, and per-task links for transcripts that exist); write failures are logged as `[proof]` warnings rather than crashing the runner.
7. On SIGINT/SIGTERM/SIGHUP, cancels all in-flight subagents before finalizing the canvas.

#### CLI knobs

| Flag | Default | Purpose |
| ------------------------------- | ------------------ | ------------------------------------------------------------------------- |
| `--models-file <path>` | — | JSON file containing a partial complexity → model override map. |
| `--state-path <path>` | — | Persist resumable state after rank boundaries. |
| `--resume-state <path>` | — | Resume from a persisted state file. |
| `--restart-on-runner-change` | `false` | Exit `75` after runner runtime files change so a supervisor can relaunch. |
| `--task-timeout-ms <ms>` | `1200000` (20 min) | Marks a task `ERROR` if it runs too long. |
| `--stream-publish-ms <ms>` | `500` | Throttles live canvas streaming writes. |
| `--stream-idle-timeout-ms <ms>` | `300000` (5 min) | Marks a task `ERROR` if no stream events arrive. |
| `--debounce <ms>` | `200` | Canvas write debounce interval. |
| Flag | Default | Purpose |
| ------------------------------- | ------------------ | ------------------------------------------------------------------------------------------------------------------------------------------------------------------ |
| `--models-file <path>` | — | JSON file containing a partial complexity → model override map. |
| `--state-path <path>` | — | Persist resumable state after rank boundaries. |
| `--resume-state <path>` | — | Resume from a persisted state file. |
| `--restart-on-runner-change` | `false` | Exit `75` after runner runtime files change so a supervisor can relaunch. |
| `--task-timeout-ms <ms>` | `1200000` (20 min) | Marks a task `ERROR` if it runs too long. |
| `--stream-publish-ms <ms>` | `500` | Throttles live canvas streaming writes. |
| `--stream-idle-timeout-ms <ms>` | `300000` (5 min) | Marks a task `ERROR` if no stream events arrive. |
| `--debounce <ms>` | `200` | Canvas write debounce interval. |
| `--full-output-dir <path>` | computed default | Per-task transcripts + `_index.md` + `_dag.json`. Default: `<cwd>/.flatbread/artifacts/dag-<title-slug>-<ts>/`. Override path or suppress with `--no-artifacts`. |
| `--no-artifacts` | `false` | Suppresses per-task transcripts, `_index.md`, and `_dag.json`; does **not** suppress `--findings-dir` JSON sidecars (separate code path). Canvas is still written. |

### Step 4 — Summarize

Expand All @@ -173,7 +179,7 @@ After the runner exits, briefly summarize what completed/failed and re-link the
| MED | `composer-2` |
| LOW | `gpt-5.4-nano` |

Override any subset inline with top-level DAG `models`, or pass a reusable profile with `--models-file <path>`. Values can be plain SDK model id strings or SDK model selections with `params`. At run time, Proof calls `Cursor.models.list()`, validates ids and param values, and expands partial selections to the closest valid preset variant using the model's default variant for omitted params. Precedence is defaults < DAG `models` < `--models-file`. The Cursor model catalog can vary by account.
Override any subset inline with top-level DAG `models`, or pass a reusable profile with `--models-file <path>`. Values can be plain SDK model id strings or SDK model selections with `params`. At run time, Proof calls `Cursor.models.list()`, validates ids and param values, and expands partial selections by requiring requested params to match a catalog variant, then choosing the valid variant whose omitted params best match the model's default variant. Precedence is defaults < DAG `models` < `--models-file`. The Cursor model catalog can vary by account.

To use a cheaper high-capability GPT model, use the base SDK id plus params, not a suffix-style id:

Expand Down Expand Up @@ -201,6 +207,7 @@ cursor-agent --list-models
# SDK-flavored alternative — also prints any per-model `parameters` and preset `variants`
pnpm -F @flatbread/proof models:list # all ids
pnpm -F @flatbread/proof models:list <model-id> # detail for one model
pnpm -F @flatbread/proof models:list --grep <text> # case-insensitive filter
pnpm -F @flatbread/proof models:list --json <model-id>
```

Expand All @@ -220,38 +227,44 @@ set -a && source .env && set +a

## CLI options

| Flag | Default | Notes |
| ---------------------------- | ------------------- | ------------------------------------------------------------------------------------------------------------- |
| `--dag` | required | Path to the DAG JSON file. |
| `--canvas-path` | composed from below | Full absolute path to the canvas file. Preferred — used by the parent-managed flow. |
| `--canvas` | — | Canvas filename stem (no `.canvas.tsx`). Used only if `--canvas-path` is omitted. |
| `--canvases-dir` | derived from cwd | Override the canvases output directory. Used only with `--canvas`. |
| `--cwd` | `process.cwd()` | Working dir each subagent operates in. |
| `--models-file` | — | JSON file containing a partial complexity → model override map. |
| `--debounce` | `200` (ms) | Canvas write debounce interval. |
| `--init-only` | `false` | Write the initial all-`PENDING` canvas and exit. No `CURSOR_API_KEY` required. |
| `--state-path` | — | Persist resumable runner state. Defaults to `.proof/run-state.json` when `--restart-on-runner-change` is set. |
| `--resume-state` | — | Load a persisted `RunState` and skip already terminal tasks. |
| `--restart-on-runner-change` | `false` | Detect runner runtime file changes after safe boundaries and exit `75` for supervisor restart. |
| `--max-runner-restarts` | `20` | Supervisor-only cap for relaunches from `proof-supervisor`. |
| `--task-timeout-ms` | `1200000` (20 min) | Marks a task `ERROR` if it exceeds this duration. |
| `--stream-publish-ms` | `500` (ms) | Throttles live canvas streaming writes to avoid excessive cloning. |
| `--stream-idle-timeout-ms` | `300000` (5 min) | Marks a task `ERROR` if no stream events arrive within this window. |
| Flag | Default | Notes |
| ---------------------------- | ------------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `--dag` | required | Path to the DAG JSON file. |
| `--canvas-path` | composed from below | Full path to the canvas file. Preferred as an absolute path for parent-managed flow; relative paths are accepted and resolve from the runner process cwd, not `--cwd`. |
| `--canvas` | — | Canvas filename stem (no `.canvas.tsx`). Used only if `--canvas-path` is omitted. |
| `--canvases-dir` | derived from cwd | Override the canvases output directory. Used only with `--canvas`. |
| `--cwd` | `process.cwd()` | Working dir each subagent operates in. |
| `--models-file` | — | JSON file containing a partial complexity → model override map. |
| `--debounce` | `200` (ms) | Canvas write debounce interval. |
| `--init-only` | `false` | Write the initial all-`PENDING` canvas and exit. No `CURSOR_API_KEY` required. |
| `--full-output-dir` | computed default | Per-task transcripts as `${taskId}.md` plus `_index.md` and `_dag.json`. Defaults to `<cwd>/.flatbread/artifacts/dag-<title-slug>-<ts>/`. Override with an explicit path or suppress with `--no-artifacts`. |
| `--no-artifacts` | `false` | Suppresses per-task transcripts, `_index.md`, and `_dag.json`; does **not** suppress `--findings-dir` JSON sidecars (separate code path). Canvas is still written. |
| `--findings-dir` | — | Per-task JSON sidecars as `${taskId}.findings.json` for original runs and `${taskId}.iter<n>.findings.json` for convergence re-runs. Schema: `{ taskId, iteration, status, durationMs, sections }`. |
| `--state-path` | — | Persist resumable runner state. Defaults to `.proof/run-state.json` when `--restart-on-runner-change` is set. |
| `--resume-state` | — | Load a persisted `RunState` and skip already terminal tasks. |
| `--restart-on-runner-change` | `false` | Detect runner runtime file changes after safe boundaries and exit `75` for supervisor restart. |
| `--max-runner-restarts` | `20` | Supervisor-only cap for relaunches from `proof-supervisor`. |
| `--task-timeout-ms` | `1200000` (20 min) | Marks a task `ERROR` if it exceeds this duration. |
| `--stream-publish-ms` | `500` (ms) | Throttles live canvas streaming writes to avoid excessive cloning. |
| `--stream-idle-timeout-ms` | `300000` (5 min) | Marks a task `ERROR` if no stream events arrive within this window. |

## Caveats

- Per-task markdown transcripts, a run index (`_index.md`), and the DAG definition (`_dag.json`) are written under **`<cwd>/.flatbread/artifacts/`** by default on **full DAG runs** (not `--init-only` or `--dry-check-cmds`). Pass `--no-artifacts` to suppress transcripts/index/DAG JSON, or `--full-output-dir` to override the path. `_index.md` links only transcripts that exist; if an individual transcript write fails, that row is marked as a missing transcript. **`--no-artifacts` does not disable `--findings-dir`** — for fully clean disk output, omit `--findings-dir` as well. In CI or read-only workspaces you may want `--no-artifacts` or a writable `--full-output-dir`.
- When using `proof-supervisor`, each **child runner process** recomputes the default artifacts path with a new timestamp unless you pin a stable directory. The supervisor forwards the full argv to each child (only `--max-runner-restarts` is stripped), so put **`--full-output-dir <path>` on the supervisor invocation** if every restart should write into the same artifacts folder.
- `--resume-state` creates a new artifact directory for the resumed session; tasks completed in prior sessions do not have transcripts in the new directory.
- Local runtime only — every subagent runs against `--cwd` (defaults to wherever you invoke the runner).
- Sibling tasks in the same rank run in parallel; do not let them write the same files.
- Inline MCP servers and sub-sub-agents are not configured by this runner.
- A failed task automatically skips all downstream dependents (they are marked `ERROR` with a "Skipped: upstream task(s) … failed" message). This prevents wasted API calls on tasks whose inputs are missing.
- Per-task streamed text is capped at `STREAM_CAP = 4000` chars to keep the canvas file modest. Upstream context passed to child tasks is capped at 2000 chars per parent.
- Per-task streamed text is capped at `STREAM_CAP = 4000` chars to keep the canvas file modest. Upstream context passed to child tasks is capped at 2000 chars per parent, with section-aware truncation when the parent output contains multiple `##` sections.
- Timed-out tasks are marked `ERROR` instead of staying indefinitely in `RUNNING`.
- SIGINT/SIGTERM/SIGHUP gracefully cancel all in-flight subagents and finalize the canvas before exiting.
- Unexpected unhandled rejections from SDK internals are suppressed to prevent runner crashes; uncaught exceptions are logged and trigger a clean shutdown.

## Reference

- Package: `@flatbread/proof` at `packages/proof`
- DAG schema example: `examples/example_dag.json`
- DAG schema example: `.cursor/skills/proof/examples/example_dag.json`
- Library exports: `import { parseDAG, computeRanks, ... } from '@flatbread/proof'`
- Cursor SDK docs: https://cursor.com/docs/api/sdk/typescript
Loading
Loading