feat(proof): consistent default artifact output with --no-artifacts opt-out by tonyketcham · Pull Request #187 · FlatbreadLabs/flatbread

tonyketcham · 2026-05-10T22:36:41Z

Summary of changes

Proof now writes default DAG transcripts under <cwd>/.flatbread/artifacts/ (repo-local Flatbread convention), with --no-artifacts / --full-output-dir overrides unchanged in spirit.

Recent commits

9d3a76c — fix(proof): resilient _index.md writes on success and shutdown paths — M-01 / M-02 / L-06: success-path .catch on writeRunIndexMarkdown, failAndExit writes best-effort _index.md, indexWritten after budget path, untitled slug fallback.
573ced4 — feat(proof): default transcript artifacts under .flatbread/artifacts — default dir under .flatbread/artifacts/; README + SKILL + run_dag header updates; supervisor --full-output-dir note in README.
ac4ccae — chore: stop ignoring .flatbread/artifacts — removed root .gitignore rule so teams can commit Proof run artifacts if they want.

Earlier doc commit 159ca69 remains for PR review wording on flags.

pnpm -F @flatbread/proof typecheck + build passed on touched TS changes

Does this introduce any non-backwards compatible changes?

Yes — default artifact location moved from ~/.cursor/projects/.../artifacts/ to <cwd>/.flatbread/artifacts/; use --full-output-dir to preserve old layout.

Does this include any user config changes?

Yes — documented in SKILL.md and README.md

- Per-task markdown transcripts, _index.md, and _dag.json are now written by default to a timestamped directory: ~/.cursor/projects/<workspace>/artifacts/dag-<title>-<ts>/ This mirrors the canvas path scheme for discoverability. - New --no-artifacts flag suppresses all artifact output. Canvas is still written. --full-output-dir still overrides the path. - Oracle and pause tasks now write markdown files. Previously only kind: task tasks produced .md files; oracle/pause output was only visible in the canvas. Now their resultText (pass/fail for oracle, approval status for pause) is persisted alongside regular task transcripts. - Enhanced _index.md: includes a link to _dag.json, run start/finish timestamps, the overall outcome, and a Kind column (task/oracle/pause). - _dag.json is written to the artifact directory at run start so the directory is fully self-contained and relational. - writeRunIndexMarkdown updated to accept runMeta (startedAt, finishedAt, runOutcome, runMessage). - SKILL.md CLI knobs and CLI options tables updated with --full-output-dir, --no-artifacts, and --findings-dir entries. - README.md gets a new Artifact Output section documenting the default behavior, --no-artifacts, and --full-output-dir. Co-authored-by: Tony <tonyketcham@users.noreply.github.com>

H1 — _index.md now written on all exit paths (not just success): - Success path: unchanged (written, indexWritten flag set) - BudgetExceededError: write _index.md before process.exit() since finally block is bypassed by process.exit() - Generic error / defensive paths: best-effort write in finally block when !indexWritten, with .catch(() => {}) to avoid masking the original error L2 — persistTaskMarkdownFile errors no longer abort the run: - All call sites in runOne (oracle/pause paths) now .catch() and log a [proof] warning instead of propagating to Promise.all - runTask finally block wrapped similarly - skipTask: converted to async, wrapped in .catch() M2 — dag.title / state.title inconsistency resolved: - Added dagTitle?: string to RunTaskOptions - baseRunOptions now includes dagTitle: dag.title - runTask uses options.dagTitle ?? state.title for artifact writes - skipTask accepts an optional dagTitle parameter; call site passes dag.title - Oracle/pause call sites already used dag.title (no change needed) SKILL.md caveats: document supervisor restart creates new artifact dir, resume creates incomplete dir, CI/headless --no-artifacts recommendation. SKILL.md step 6: split into three sub-bullets accurately reflecting the three write phases (_dag.json at start, task .md as they finish, _index.md at all run ends). Co-authored-by: Tony <tonyketcham@users.noreply.github.com>

cursor

Stale comment

Review verdict

REQUEST_CHANGES — two independent consensus HIGH findings (.cursor/skills/proof/SKILL.md:151-162 flagged by all five perspectives; .cursor/skills/proof/SKILL.md:233-235 flagged by two perspectives as HIGH) trigger this rule. The coverage plan is also non-empty against modified non-test source.

Chunk-bound feedback

.cursor/skills/proof/SKILL.md:151-162 — HIGH

flagged-by: correctness-and-contracts, test-coverage-robustness, proof-runtime-internals, docs-and-positioning, dx-and-examples

risk: The --no-artifacts row says "Skip all artifact file output," but --findings-dir JSON sidecars travel through a completely separate code path (findingsAbsoluteDir, lines 576–581 of run_dag.ts) that is never gated by fullOutputAbsoluteDir. A user who reads this row and passes --no-artifacts to get clean-room output will still find JSON sidecars written if --findings-dir is active.

minimal fix: Revise the Purpose cell to: "Suppresses per-task transcripts, _index.md, and _dag.json; does not suppress --findings-dir JSON sidecars."

.cursor/skills/proof/SKILL.md:233-235 — HIGH

flagged-by: docs-and-positioning, dx-and-examples

risk: The note correctly states "each restart creates a fresh timestamped artifact directory," but omits the critical corollary: that the supervisor forwards the full argv to each child (minus only --max-runner-restarts, per stripSupervisorOnlyFlags). Without pinning --full-output-dir on the supervisor invocation, every restart silently scatters artifacts across distinct timestamped directories. Users who depend on post-mortem artifact recovery from a supervisor run will find a fragmented history with no guidance on how to avoid it.

minimal fix: Rewrite to: "Each new runner process receives its own default timestamped artifacts directory. Pin --full-output-dir on the supervisor invocation so every restart child inherits a stable path."

.cursor/skills/proof/SKILL.md:143-147 — MED

flagged-by: docs-and-positioning (LOW), dx-and-examples (MED)

risk: "At run end (all exit paths including errors)" creates a contractual guarantee. In practice, failAndExit and the finally-block calls both use .catch(() => {}) with no log trace; a write error during SIGTERM teardown is silently dropped. A user relying on post-mortem artifact recovery would have no signal that the write failed.

minimal fix: Qualify to: "best-effort on all exit paths; write failures are logged as warnings."

.cursor/skills/proof/SKILL.md:210-229 — MED

flagged-by: docs-and-positioning, dx-and-examples

risk: The full flag reference table at lines 210–229 defines --no-artifacts as "Skip writing per-task transcripts, _index.md, _dag.json" while the abbreviated table at 151–162 says "Skip all artifact file output." Before the fix to 151–162, the two rows are internally contradictory; after the fix they must both carry the identical qualifier about --findings-dir sidecars.

minimal fix: Apply the identical --findings-dir scoping note to the --no-artifacts row in this table so both definitions are byte-identical in meaning.

packages/proof/README.md:76-101 — MED

flagged-by: docs-and-positioning, dx-and-examples

risk: "By default, every run writes…" is incorrect for --init-only (exits before artifact setup) and --dry-check-cmds (bypasses the artifact path entirely). Additionally, no migration note exists for users who relied on the prior behavior where artifacts only appeared when --full-output-dir was explicitly passed; upgraders will be surprised by new disk writes under ~/.cursor/.

minimal fix: Narrow the lede to "By default, every full DAG run writes…" with a parenthetical (not --init-only or --dry-check-cmds); add one sentence: "Previously artifacts required passing --full-output-dir; use --no-artifacts to restore the old behavior."

Consensus findings

.cursor/skills/proof/SKILL.md:151-162 — --no-artifacts scope misstatement: flagged as HIGH by all five perspectives.

.cursor/skills/proof/SKILL.md:233-235 — Supervisor argv forwarding mechanics misstated: flagged by docs-and-positioning (HIGH) and dx-and-examples (MED); both agree the text is factually misleading.

packages/proof/README.md:76-101 — Missing --init-only/--dry-check-cmds exceptions and upgrade migration note: flagged by docs-and-positioning (MED) and dx-and-examples (MED).

.cursor/skills/proof/SKILL.md:143-147 — Overstated "all exit paths" guarantee vs. silent .catch(() => {}): docs-and-positioning (LOW) and dx-and-examples (MED) — dx wins (MED), this is a user-facing contract.

Coverage plan

No test files were added in this PR. The following cases are missing from packages/proof/src/__tests__/run_dag.test.ts:

positive — --no-artifacts alone: fullOutputAbsoluteDir is undefined, no artifact directory is created, _dag.json/per-task .md/_index.md are never written.

negative — --no-artifacts + --full-output-dir <path> combined: a warning is emitted; no artifact files are written.

positive — default run (no flags): artifact directory matches ~/.cursor/projects/<slug>/artifacts/dag-<title>-<ts>/; _dag.json written before first task; per-task .md files written after each task; _index.md written at run end.

positive — --full-output-dir <explicit-path>: artifacts land at the specified path, not the default computed path.

edge:signal — SIGTERM during run: _index.md is written before process.exit is called.

edge:exit-75 — maybeRestartAfterRunnerChange: _index.md is written with runOutcome: 'RESTARTING_RUNNER' before exit.

edge:budget-exceeded — BudgetExceededError path: _index.md is written exactly once (indexWritten guard prevents a double write).

positive — writeRunIndexMarkdown output contains all runMeta fields and a [_dag.json](./_dag.json) back-link.

edge:FS-error — mkdir for artifact dir fails (permission denied): runner does not crash; DAG executes; canvas is written; error surfaces as a warning.

edge:initOnly — --init-only: no artifact directory is created; canvas is written and process exits.

edge:dryCheckCmds — --dry-check-cmds: no artifact directory is created; process exits before artifact setup.

edge:emptySlug — defaultArtifactsDir when slugifyTitle('') returns '': path avoids a double-dash segment or a descriptive error is thrown.

negative — partial failure (one task errors): _index.md is still written at run end with the failed task row present.

Suggested follow-ups

--no-artifacts + --findings-dir interaction should be cross-documented. The two flags operate on independent output paths; a user wanting fully clean output needs to pass both. Neither the README nor SKILL.md currently says so.

Harden silent catches in artifact write paths. failAndExit and the finally block use .catch(() => {}) with no logging. Upgrade to .catch((e) => console.warn('[proof] _index.md write failed:', e.message)) so ops teams have signal when post-mortem artifacts are silently dropped.

Evaluate runtime FS permissions. Whether ~/.cursor/projects/<slug>/artifacts/ is writable in sandboxed CI, Docker containers with read-only home mounts, or Windows paths is residual risk. Add a known-environments matrix or a startup permission check with a clear error message.

Reviewer scoreboard

correctness-and-contracts: 10 coverage gaps, 0 direct findings, signal: MED — accurate, well-specified coverage gaps; no code-level findings.

test-coverage-robustness: 12 coverage gaps, 0 direct findings, signal: MED — thorough case enumeration; good FS-error and empty-DAG additions.

proof-runtime-internals: 11 coverage gaps, 0 direct findings, signal: MED — uniquely surfaced --init-only/--dry-check-cmds exclusions; heavy overlap with other coverage perspectives.

docs-and-positioning: 4 findings (2 HIGH, 1 MED, 1 LOW), 0 coverage gaps, signal: HIGH — highest precision; SKILL.md:233-235 supervisor-forwarding finding was unique and actionable; correctly identified the --findings-dir scoping error.

dx-and-examples: 5 findings (2 HIGH, 2 MED, 1 LOW), 0 coverage gaps, signal: HIGH — strong overlap with docs-and-positioning; added the upgrade-migration note for README:76-101.

_{Sent by Cursor Automation: Flatbread PR Review}

…ilures - SKILL: clarify --no-artifacts does not suppress --findings-dir JSON sidecars (abbreviated + full CLI tables aligned). - SKILL: qualify artifact lifecycle as init-only/dry-check exempt and index writes as best-effort with logged warnings. - SKILL: fix supervisor caveat — argv is forwarded; pin --full-output-dir on the supervisor for stable paths across restarts. - README: narrow default artifact wording to full DAG runs; add upgrade note and cross-document --findings-dir vs --no-artifacts. - run_dag: replace silent _index.md write catches with console.warn so failed teardown writes surface in logs. Co-authored-by: Tony <tonyketcham@users.noreply.github.com>

cursor

Review verdict

REQUEST_CHANGES — one new MED correctness gap and a non-empty coverage plan against a documented public contract.

Prior-round findings: all addressed

All five threads from the previous run are resolved in this push. Specifically:

--no-artifacts scope clarified in both abbreviated and full CLI tables (was HIGH consensus across all reviewers)
Supervisor argv-forwarding caveat added with explicit --full-output-dir pin guidance (was HIGH)
SKILL.md step-6 bullet qualified to "best-effort _index.md… write failures are logged as warnings" (was MED — partially addressed; see new finding below)
README lede narrowed to "full DAG runs" + upgrade migration note added (was MED)
Internal table contradiction resolved (was MED)

Chunk-bound feedback

`packages/proof/src/run_dag.ts:913-925` — MED

flagged-by: proof-runtime-internals, correctness-and-contracts
risk: writeRunIndexMarkdown in the success path has no .catch(). If the write throws after all tasks complete (disk full, permission change, race), the uncaught error propagates to the outer catch block, which calls markRunTerminated(…, 'FAILED') and reschedules the canvas — incorrectly showing the run as FAILED even though all tasks finished. indexWritten stays false, so the finally block then attempts a second write with the now-stale runOutcome = 'FAILED'. This directly contradicts the SKILL.md "best-effort… write failures are logged as warnings rather than crashing the runner" guarantee introduced by this same PR. The BudgetExceededError and generic-error paths were correctly wrapped with .catch() in this push; the success path was missed.
minimal fix:

await writeRunIndexMarkdown(
  fullOutputAbsoluteDir,
  dag.title,
  state.tasks,
  { startedAt: state.startedAt, finishedAt: state.finishedAt,
    runOutcome: state.runOutcome, runMessage: state.runMessage }
).catch((e: unknown) => {
  console.warn(`[proof] _index.md write failed: ${e instanceof Error ? e.message : String(e)}`);
});
indexWritten = true; // set unconditionally to prevent retry with stale FAILED state

Coverage plan

No test files were added or modified in this PR. The following cases are still absent from packages/proof/src/__tests__/run_dag.test.ts (carried over from the prior review; none implemented):

packages/proof/src/__tests__/run_dag.test.ts — positive — --no-artifacts: fullOutputAbsoluteDir resolves to undefined; no artifact directory is created; no _dag.json, per-task .md, or _index.md is written.
packages/proof/src/__tests__/run_dag.test.ts — negative — --no-artifacts + --full-output-dir <path> combined: --full-output-dir is silently ignored; no artifacts written; the behavior is deterministic.
packages/proof/src/__tests__/run_dag.test.ts — positive — default run (no flags): artifact directory path matches ~/.cursor/projects/<slug>/artifacts/dag-<title>-<ts>/; _dag.json present before first task starts; per-task .md written after each task; _index.md written at run end with correct runOutcome.
packages/proof/src/__tests__/run_dag.test.ts — positive — --full-output-dir <explicit-path>: artifacts land at the specified path, not the default computed path.
packages/proof/src/__tests__/run_dag.test.ts — edge:FS-error — _index.md write in success path throws (e.g. directory removed mid-run): runner does not mark the canvas as FAILED; a [proof] _index.md write failed warning is logged; indexWritten is true so no retry.
packages/proof/src/__tests__/run_dag.test.ts — edge:budget-exceeded — BudgetExceededError path: _index.md is written exactly once (indexWritten guard prevents double write).
packages/proof/src/__tests__/run_dag.test.ts — edge:emptySlug — slugifyTitle('') returns '': default artifact path avoids a double-dash segment or throws a descriptive error.
packages/proof/src/__tests__/run_dag.test.ts — edge:initOnly — --init-only: no artifact directory is created; canvas is written; process exits.
packages/proof/src/__tests__/run_dag.test.ts — edge:dryCheckCmds — --dry-check-cmds: no artifact directory is created; process exits before artifact setup.
packages/proof/src/__tests__/run_dag.test.ts — positive — writeRunIndexMarkdown output includes [_dag.json](./_dag.json) back-link and all runMeta fields.

Suggested follow-ups

--no-artifacts + --full-output-dir conflict is silent. A warning when both flags are supplied (console.warn('[proof] --full-output-dir ignored because --no-artifacts is set')) would prevent confusion; currently --no-artifacts silently wins with no diagnostic.
_dag.json write at startup is not best-effort. Unlike per-task writes (which have .catch()), the initial mkdir + _dag.json write at startup propagates errors into the outer catch block and aborts the run before any tasks start. The docs do not claim best-effort for this path, which is internally consistent — but it means a permission issue on ~/.cursor/projects/<slug>/artifacts/ silently kills the run with no guidance. Consider either a best-effort guard or a clearer upfront diagnostic.

Reviewer scoreboard

correctness-and-contracts: 1 finding (MED), 3 coverage gaps, signal: MED — correctly identified the .catch() asymmetry between success and error paths.
proof-runtime-internals: 1 finding (MED), 4 coverage gaps, signal: HIGH — uniquely traced the canvas-corruption consequence of the missing .catch() in the success path.
test-coverage-robustness: 0 findings, 10 coverage gaps, signal: MED — thorough coverage enumeration; accurate.
docs-and-positioning: 0 findings (all prior doc issues fully resolved), 0 coverage gaps, signal: MED — correctly confirmed prior thread resolutions.
dx-and-examples: 0 findings, 0 coverage gaps, signal: MED — confirmed all prior example/README issues resolved.

_{Sent by Cursor Automation: Flatbread PR Review}

cursor · 2026-05-11T02:21:26Z

+          runMessage: state.runMessage,
+        }
      );
+      indexWritten = true;


[MED — new finding] The writeRunIndexMarkdown call above this line has no .catch(). If the write fails after all tasks complete (disk full, mid-run permission change), the error propagates to the outer catch block, which calls markRunTerminated(…, 'FAILED') and overwrites the canvas with a spurious FAILED state — even though every task finished successfully. indexWritten stays false, so the finally block then retries with the now-stale runOutcome = 'FAILED'.

The BudgetExceededError and generic-error paths in this same PR correctly wrap the call with .catch((e) => console.warn(...)). The success path was missed, leaving the SKILL.md "best-effort… write failures are logged as warnings" guarantee unfulfilled for the happy path.

Minimal fix:

await writeRunIndexMarkdown( fullOutputAbsoluteDir, dag.title, state.tasks, { startedAt: state.startedAt, finishedAt: state.finishedAt, runOutcome: state.runOutcome, runMessage: state.runMessage } ).catch((e: unknown) => { console.warn(`[proof] _index.md write failed: ${e instanceof Error ? e.message : String(e)}`); }); indexWritten = true; // set unconditionally — prevents retry with stale FAILED state

- Wrap success-path writeRunIndexMarkdown in .catch + console.warn so FS failures do not fall through to generic FAILED handling / misleading finally. - Write best-effort _index.md inside failAndExit after canvas flush so SIGINT/ SIGTERM/SIGHUP and uncaught-exception exits match BudgetExceeded symmetry. - Set indexWritten after BudgetExceededError index write for refactor safety. Also reserve slug fallback 'untitled' when slugifyTitle collapses the DAG title. Co-authored-by: Tony <tonyketcham@users.noreply.github.com>

Store per-run transcripts, _index.md, and _dag.json next to the workspace (<cwd>/.flatbread/artifacts/dag-<slug>-<timestamp>/) instead of ~/.cursor/... projects/<slug>/artifacts/, matching Flatbread's repo-local .flatbread convention. - Document canvas vs artifact paths in README; supervisor --full-output-dir pin - Align SKILL.md CLI tables and caveats - Ignore .flatbread/artifacts/ at repo root to avoid committing DAG dumps - Clarify --no-artifacts vs --findings-dir in run_dag.ts header comment Co-authored-by: Tony <tonyketcham@users.noreply.github.com>

…nscripts) Co-authored-by: Tony <tonyketcham@users.noreply.github.com>

Missed running pnpm lint:fix:fast after recent edits. Fixes lint CI failures on both node 20.x and 22.x runners. Change-Id: If241809977d0cc9373f2971f2735e5e4d2338c55

Change-Id: I9c89b8dee5dfe4ccaceb6287bb8c63b5efa3bfb4

cursoragent and others added 2 commits May 10, 2026 22:35

cursor Bot reviewed May 11, 2026

View reviewed changes

Comment thread .cursor/skills/proof/SKILL.md Outdated

Comment thread .cursor/skills/proof/SKILL.md Outdated

Comment thread .cursor/skills/proof/SKILL.md Outdated

Comment thread .cursor/skills/proof/SKILL.md Outdated

Comment thread packages/proof/README.md Outdated

cursor Bot reviewed May 11, 2026

View reviewed changes

tonyketcham marked this pull request as ready for review May 11, 2026 02:31

cursoragent and others added 3 commits May 11, 2026 02:33

chore: stop ignoring .flatbread/artifacts (allow committing Proof tra…

ac4ccae

…nscripts) Co-authored-by: Tony <tonyketcham@users.noreply.github.com>

tonyketcham enabled auto-merge (rebase) May 11, 2026 02:37

tonyketcham and others added 3 commits May 10, 2026 19:41

Merge branch 'main' into toeknee/proof-artifact-consistency-6c59

fc45512

chore: fix Prettier formatting in SKILL.md and run_dag.ts

5c4126e

Missed running pnpm lint:fix:fast after recent edits. Fixes lint CI failures on both node 20.x and 22.x runners. Change-Id: If241809977d0cc9373f2971f2735e5e4d2338c55

fix feedback

0ca0dc6

Change-Id: I9c89b8dee5dfe4ccaceb6287bb8c63b5efa3bfb4

tonyketcham disabled auto-merge May 11, 2026 03:05

tonyketcham merged commit ab20d86 into main May 11, 2026
19 checks passed

tonyketcham deleted the toeknee/proof-artifact-consistency-6c59 branch May 11, 2026 03:05

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(proof): consistent default artifact output with --no-artifacts opt-out#187

feat(proof): consistent default artifact output with --no-artifacts opt-out#187
tonyketcham merged 9 commits into
mainfrom
toeknee/proof-artifact-consistency-6c59

tonyketcham commented May 10, 2026 •

edited by cursor Bot

Loading

Uh oh!

cursor Bot left a comment •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

cursor Bot left a comment

Uh oh!

cursor Bot May 11, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

tonyketcham commented May 10, 2026 • edited by cursor Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Does this introduce any non-backwards compatible changes?

Does this include any user config changes?

Uh oh!

cursor Bot left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Review verdict

Chunk-bound feedback

.cursor/skills/proof/SKILL.md:151-162 — HIGH

.cursor/skills/proof/SKILL.md:233-235 — HIGH

.cursor/skills/proof/SKILL.md:143-147 — MED

.cursor/skills/proof/SKILL.md:210-229 — MED

packages/proof/README.md:76-101 — MED

Consensus findings

Coverage plan

Suggested follow-ups

Reviewer scoreboard

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

cursor Bot left a comment

Choose a reason for hiding this comment

Review verdict

Prior-round findings: all addressed

Chunk-bound feedback

packages/proof/src/run_dag.ts:913-925 — MED

Coverage plan

Suggested follow-ups

Reviewer scoreboard

Uh oh!

cursor Bot May 11, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

tonyketcham commented May 10, 2026 •

edited by cursor Bot

Loading

cursor Bot left a comment •

edited

Loading

`.cursor/skills/proof/SKILL.md:151-162` — HIGH

`.cursor/skills/proof/SKILL.md:233-235` — HIGH

`.cursor/skills/proof/SKILL.md:143-147` — MED

`.cursor/skills/proof/SKILL.md:210-229` — MED

`packages/proof/README.md:76-101` — MED

`packages/proof/src/run_dag.ts:913-925` — MED