feat(proof): consistent default artifact output with --no-artifacts opt-out#187
Conversation
- Per-task markdown transcripts, _index.md, and _dag.json are now written by default to a timestamped directory: ~/.cursor/projects/<workspace>/artifacts/dag-<title>-<ts>/ This mirrors the canvas path scheme for discoverability. - New --no-artifacts flag suppresses all artifact output. Canvas is still written. --full-output-dir still overrides the path. - Oracle and pause tasks now write markdown files. Previously only kind: task tasks produced .md files; oracle/pause output was only visible in the canvas. Now their resultText (pass/fail for oracle, approval status for pause) is persisted alongside regular task transcripts. - Enhanced _index.md: includes a link to _dag.json, run start/finish timestamps, the overall outcome, and a Kind column (task/oracle/pause). - _dag.json is written to the artifact directory at run start so the directory is fully self-contained and relational. - writeRunIndexMarkdown updated to accept runMeta (startedAt, finishedAt, runOutcome, runMessage). - SKILL.md CLI knobs and CLI options tables updated with --full-output-dir, --no-artifacts, and --findings-dir entries. - README.md gets a new Artifact Output section documenting the default behavior, --no-artifacts, and --full-output-dir. Co-authored-by: Tony <tonyketcham@users.noreply.github.com>
H1 — _index.md now written on all exit paths (not just success):
- Success path: unchanged (written, indexWritten flag set)
- BudgetExceededError: write _index.md before process.exit() since
finally block is bypassed by process.exit()
- Generic error / defensive paths: best-effort write in finally block
when !indexWritten, with .catch(() => {}) to avoid masking the
original error
L2 — persistTaskMarkdownFile errors no longer abort the run:
- All call sites in runOne (oracle/pause paths) now .catch() and log
a [proof] warning instead of propagating to Promise.all
- runTask finally block wrapped similarly
- skipTask: converted to async, wrapped in .catch()
M2 — dag.title / state.title inconsistency resolved:
- Added dagTitle?: string to RunTaskOptions
- baseRunOptions now includes dagTitle: dag.title
- runTask uses options.dagTitle ?? state.title for artifact writes
- skipTask accepts an optional dagTitle parameter; call site passes dag.title
- Oracle/pause call sites already used dag.title (no change needed)
SKILL.md caveats: document supervisor restart creates new artifact dir,
resume creates incomplete dir, CI/headless --no-artifacts recommendation.
SKILL.md step 6: split into three sub-bullets accurately reflecting the
three write phases (_dag.json at start, task .md as they finish, _index.md
at all run ends).
Co-authored-by: Tony <tonyketcham@users.noreply.github.com>
There was a problem hiding this comment.
Stale comment
Review verdict
REQUEST_CHANGES — two independent consensus HIGH findings (
.cursor/skills/proof/SKILL.md:151-162flagged by all five perspectives;.cursor/skills/proof/SKILL.md:233-235flagged by two perspectives as HIGH) trigger this rule. The coverage plan is also non-empty against modified non-test source.
Chunk-bound feedback
.cursor/skills/proof/SKILL.md:151-162— HIGH
- flagged-by: correctness-and-contracts, test-coverage-robustness, proof-runtime-internals, docs-and-positioning, dx-and-examples
- risk: The
--no-artifactsrow says "Skip all artifact file output," but--findings-dirJSON sidecars travel through a completely separate code path (findingsAbsoluteDir, lines 576–581 ofrun_dag.ts) that is never gated byfullOutputAbsoluteDir. A user who reads this row and passes--no-artifactsto get clean-room output will still find JSON sidecars written if--findings-diris active.- minimal fix: Revise the Purpose cell to: "Suppresses per-task transcripts,
_index.md, and_dag.json; does not suppress--findings-dirJSON sidecars."
.cursor/skills/proof/SKILL.md:233-235— HIGH
- flagged-by: docs-and-positioning, dx-and-examples
- risk: The note correctly states "each restart creates a fresh timestamped artifact directory," but omits the critical corollary: that the supervisor forwards the full argv to each child (minus only
--max-runner-restarts, perstripSupervisorOnlyFlags). Without pinning--full-output-diron the supervisor invocation, every restart silently scatters artifacts across distinct timestamped directories. Users who depend on post-mortem artifact recovery from a supervisor run will find a fragmented history with no guidance on how to avoid it.- minimal fix: Rewrite to: "Each new runner process receives its own default timestamped artifacts directory. Pin
--full-output-diron the supervisor invocation so every restart child inherits a stable path."
.cursor/skills/proof/SKILL.md:143-147— MED
- flagged-by: docs-and-positioning (LOW), dx-and-examples (MED)
- risk: "At run end (all exit paths including errors)" creates a contractual guarantee. In practice,
failAndExitand thefinally-block calls both use.catch(() => {})with no log trace; a write error during SIGTERM teardown is silently dropped. A user relying on post-mortem artifact recovery would have no signal that the write failed.- minimal fix: Qualify to: "best-effort on all exit paths; write failures are logged as warnings."
.cursor/skills/proof/SKILL.md:210-229— MED
- flagged-by: docs-and-positioning, dx-and-examples
- risk: The full flag reference table at lines 210–229 defines
--no-artifactsas "Skip writing per-task transcripts,_index.md,_dag.json" while the abbreviated table at 151–162 says "Skip all artifact file output." Before the fix to 151–162, the two rows are internally contradictory; after the fix they must both carry the identical qualifier about--findings-dirsidecars.- minimal fix: Apply the identical
--findings-dirscoping note to the--no-artifactsrow in this table so both definitions are byte-identical in meaning.
packages/proof/README.md:76-101— MED
- flagged-by: docs-and-positioning, dx-and-examples
- risk: "By default, every run writes…" is incorrect for
--init-only(exits before artifact setup) and--dry-check-cmds(bypasses the artifact path entirely). Additionally, no migration note exists for users who relied on the prior behavior where artifacts only appeared when--full-output-dirwas explicitly passed; upgraders will be surprised by new disk writes under~/.cursor/.- minimal fix: Narrow the lede to "By default, every full DAG run writes…" with a parenthetical (not
--init-onlyor--dry-check-cmds); add one sentence: "Previously artifacts required passing--full-output-dir; use--no-artifactsto restore the old behavior."
Consensus findings
.cursor/skills/proof/SKILL.md:151-162—--no-artifactsscope misstatement: flagged as HIGH by all five perspectives..cursor/skills/proof/SKILL.md:233-235— Supervisor argv forwarding mechanics misstated: flagged by docs-and-positioning (HIGH) and dx-and-examples (MED); both agree the text is factually misleading.packages/proof/README.md:76-101— Missing--init-only/--dry-check-cmdsexceptions and upgrade migration note: flagged by docs-and-positioning (MED) and dx-and-examples (MED)..cursor/skills/proof/SKILL.md:143-147— Overstated "all exit paths" guarantee vs. silent.catch(() => {}): docs-and-positioning (LOW) and dx-and-examples (MED) — dx wins (MED), this is a user-facing contract.
Coverage plan
No test files were added in this PR. The following cases are missing from
packages/proof/src/__tests__/run_dag.test.ts:
- positive —
--no-artifactsalone:fullOutputAbsoluteDirisundefined, no artifact directory is created,_dag.json/per-task.md/_index.mdare never written.- negative —
--no-artifacts+--full-output-dir <path>combined: a warning is emitted; no artifact files are written.- positive — default run (no flags): artifact directory matches
~/.cursor/projects/<slug>/artifacts/dag-<title>-<ts>/;_dag.jsonwritten before first task; per-task.mdfiles written after each task;_index.mdwritten at run end.- positive —
--full-output-dir <explicit-path>: artifacts land at the specified path, not the default computed path.- edge:signal — SIGTERM during run:
_index.mdis written beforeprocess.exitis called.- edge:exit-75 —
maybeRestartAfterRunnerChange:_index.mdis written withrunOutcome: 'RESTARTING_RUNNER'before exit.- edge:budget-exceeded —
BudgetExceededErrorpath:_index.mdis written exactly once (indexWrittenguard prevents a double write).- positive —
writeRunIndexMarkdownoutput contains allrunMetafields and a[_dag.json](./_dag.json)back-link.- edge:FS-error —
mkdirfor artifact dir fails (permission denied): runner does not crash; DAG executes; canvas is written; error surfaces as a warning.- edge:initOnly —
--init-only: no artifact directory is created; canvas is written and process exits.- edge:dryCheckCmds —
--dry-check-cmds: no artifact directory is created; process exits before artifact setup.- edge:emptySlug —
defaultArtifactsDirwhenslugifyTitle('')returns'': path avoids a double-dash segment or a descriptive error is thrown.- negative — partial failure (one task errors):
_index.mdis still written at run end with the failed task row present.
Suggested follow-ups
--no-artifacts+--findings-dirinteraction should be cross-documented. The two flags operate on independent output paths; a user wanting fully clean output needs to pass both. Neither the README nor SKILL.md currently says so.- Harden silent catches in artifact write paths.
failAndExitand thefinallyblock use.catch(() => {})with no logging. Upgrade to.catch((e) => console.warn('[proof] _index.md write failed:', e.message))so ops teams have signal when post-mortem artifacts are silently dropped.- Evaluate runtime FS permissions. Whether
~/.cursor/projects/<slug>/artifacts/is writable in sandboxed CI, Docker containers with read-only home mounts, or Windows paths is residual risk. Add a known-environments matrix or a startup permission check with a clear error message.
Reviewer scoreboard
- correctness-and-contracts: 10 coverage gaps, 0 direct findings, signal: MED — accurate, well-specified coverage gaps; no code-level findings.
- test-coverage-robustness: 12 coverage gaps, 0 direct findings, signal: MED — thorough case enumeration; good FS-error and empty-DAG additions.
- proof-runtime-internals: 11 coverage gaps, 0 direct findings, signal: MED — uniquely surfaced
--init-only/--dry-check-cmdsexclusions; heavy overlap with other coverage perspectives.- docs-and-positioning: 4 findings (2 HIGH, 1 MED, 1 LOW), 0 coverage gaps, signal: HIGH — highest precision; SKILL.md:233-235 supervisor-forwarding finding was unique and actionable; correctly identified the
--findings-dirscoping error.- dx-and-examples: 5 findings (2 HIGH, 2 MED, 1 LOW), 0 coverage gaps, signal: HIGH — strong overlap with docs-and-positioning; added the upgrade-migration note for README:76-101.
Sent by Cursor Automation: Flatbread PR Review
…ilures - SKILL: clarify --no-artifacts does not suppress --findings-dir JSON sidecars (abbreviated + full CLI tables aligned). - SKILL: qualify artifact lifecycle as init-only/dry-check exempt and index writes as best-effort with logged warnings. - SKILL: fix supervisor caveat — argv is forwarded; pin --full-output-dir on the supervisor for stable paths across restarts. - README: narrow default artifact wording to full DAG runs; add upgrade note and cross-document --findings-dir vs --no-artifacts. - run_dag: replace silent _index.md write catches with console.warn so failed teardown writes surface in logs. Co-authored-by: Tony <tonyketcham@users.noreply.github.com>
There was a problem hiding this comment.
Review verdict
REQUEST_CHANGES — one new MED correctness gap and a non-empty coverage plan against a documented public contract.
Prior-round findings: all addressed
All five threads from the previous run are resolved in this push. Specifically:
--no-artifactsscope clarified in both abbreviated and full CLI tables (was HIGH consensus across all reviewers)- Supervisor argv-forwarding caveat added with explicit
--full-output-dirpin guidance (was HIGH) - SKILL.md step-6 bullet qualified to "best-effort
_index.md… write failures are logged as warnings" (was MED — partially addressed; see new finding below) - README lede narrowed to "full DAG runs" + upgrade migration note added (was MED)
- Internal table contradiction resolved (was MED)
Chunk-bound feedback
packages/proof/src/run_dag.ts:913-925 — MED
- flagged-by: proof-runtime-internals, correctness-and-contracts
- risk:
writeRunIndexMarkdownin the success path has no.catch(). If the write throws after all tasks complete (disk full, permission change, race), the uncaught error propagates to the outer catch block, which callsmarkRunTerminated(…, 'FAILED')and reschedules the canvas — incorrectly showing the run asFAILEDeven though all tasks finished.indexWrittenstaysfalse, so the finally block then attempts a second write with the now-stalerunOutcome = 'FAILED'. This directly contradicts the SKILL.md "best-effort… write failures are logged as warnings rather than crashing the runner" guarantee introduced by this same PR. The BudgetExceededError and generic-error paths were correctly wrapped with.catch()in this push; the success path was missed. - minimal fix:
await writeRunIndexMarkdown(
fullOutputAbsoluteDir,
dag.title,
state.tasks,
{ startedAt: state.startedAt, finishedAt: state.finishedAt,
runOutcome: state.runOutcome, runMessage: state.runMessage }
).catch((e: unknown) => {
console.warn(`[proof] _index.md write failed: ${e instanceof Error ? e.message : String(e)}`);
});
indexWritten = true; // set unconditionally to prevent retry with stale FAILED stateCoverage plan
No test files were added or modified in this PR. The following cases are still absent from packages/proof/src/__tests__/run_dag.test.ts (carried over from the prior review; none implemented):
packages/proof/src/__tests__/run_dag.test.ts— positive —--no-artifacts:fullOutputAbsoluteDirresolves toundefined; no artifact directory is created; no_dag.json, per-task.md, or_index.mdis written.packages/proof/src/__tests__/run_dag.test.ts— negative —--no-artifacts+--full-output-dir <path>combined:--full-output-diris silently ignored; no artifacts written; the behavior is deterministic.packages/proof/src/__tests__/run_dag.test.ts— positive — default run (no flags): artifact directory path matches~/.cursor/projects/<slug>/artifacts/dag-<title>-<ts>/;_dag.jsonpresent before first task starts; per-task.mdwritten after each task;_index.mdwritten at run end with correctrunOutcome.packages/proof/src/__tests__/run_dag.test.ts— positive —--full-output-dir <explicit-path>: artifacts land at the specified path, not the default computed path.packages/proof/src/__tests__/run_dag.test.ts— edge:FS-error —_index.mdwrite in success path throws (e.g. directory removed mid-run): runner does not mark the canvas asFAILED; a[proof] _index.md write failedwarning is logged;indexWrittenistrueso no retry.packages/proof/src/__tests__/run_dag.test.ts— edge:budget-exceeded —BudgetExceededErrorpath:_index.mdis written exactly once (indexWrittenguard prevents double write).packages/proof/src/__tests__/run_dag.test.ts— edge:emptySlug —slugifyTitle('')returns'': default artifact path avoids a double-dash segment or throws a descriptive error.packages/proof/src/__tests__/run_dag.test.ts— edge:initOnly —--init-only: no artifact directory is created; canvas is written; process exits.packages/proof/src/__tests__/run_dag.test.ts— edge:dryCheckCmds —--dry-check-cmds: no artifact directory is created; process exits before artifact setup.packages/proof/src/__tests__/run_dag.test.ts— positive —writeRunIndexMarkdownoutput includes[_dag.json](./_dag.json)back-link and allrunMetafields.
Suggested follow-ups
--no-artifacts+--full-output-dirconflict is silent. A warning when both flags are supplied (console.warn('[proof] --full-output-dir ignored because --no-artifacts is set')) would prevent confusion; currently--no-artifactssilently wins with no diagnostic._dag.jsonwrite at startup is not best-effort. Unlike per-task writes (which have.catch()), the initialmkdir+_dag.jsonwrite at startup propagates errors into the outer catch block and aborts the run before any tasks start. The docs do not claim best-effort for this path, which is internally consistent — but it means a permission issue on~/.cursor/projects/<slug>/artifacts/silently kills the run with no guidance. Consider either a best-effort guard or a clearer upfront diagnostic.
Reviewer scoreboard
- correctness-and-contracts: 1 finding (MED), 3 coverage gaps, signal: MED — correctly identified the
.catch()asymmetry between success and error paths. - proof-runtime-internals: 1 finding (MED), 4 coverage gaps, signal: HIGH — uniquely traced the canvas-corruption consequence of the missing
.catch()in the success path. - test-coverage-robustness: 0 findings, 10 coverage gaps, signal: MED — thorough coverage enumeration; accurate.
- docs-and-positioning: 0 findings (all prior doc issues fully resolved), 0 coverage gaps, signal: MED — correctly confirmed prior thread resolutions.
- dx-and-examples: 0 findings, 0 coverage gaps, signal: MED — confirmed all prior example/README issues resolved.
Sent by Cursor Automation: Flatbread PR Review
| runMessage: state.runMessage, | ||
| } | ||
| ); | ||
| indexWritten = true; |
There was a problem hiding this comment.
[MED — new finding] The writeRunIndexMarkdown call above this line has no .catch(). If the write fails after all tasks complete (disk full, mid-run permission change), the error propagates to the outer catch block, which calls markRunTerminated(…, 'FAILED') and overwrites the canvas with a spurious FAILED state — even though every task finished successfully. indexWritten stays false, so the finally block then retries with the now-stale runOutcome = 'FAILED'.
The BudgetExceededError and generic-error paths in this same PR correctly wrap the call with .catch((e) => console.warn(...)). The success path was missed, leaving the SKILL.md "best-effort… write failures are logged as warnings" guarantee unfulfilled for the happy path.
Minimal fix:
await writeRunIndexMarkdown(
fullOutputAbsoluteDir,
dag.title,
state.tasks,
{ startedAt: state.startedAt, finishedAt: state.finishedAt,
runOutcome: state.runOutcome, runMessage: state.runMessage }
).catch((e: unknown) => {
console.warn(`[proof] _index.md write failed: ${e instanceof Error ? e.message : String(e)}`);
});
indexWritten = true; // set unconditionally — prevents retry with stale FAILED state- Wrap success-path writeRunIndexMarkdown in .catch + console.warn so FS failures do not fall through to generic FAILED handling / misleading finally. - Write best-effort _index.md inside failAndExit after canvas flush so SIGINT/ SIGTERM/SIGHUP and uncaught-exception exits match BudgetExceeded symmetry. - Set indexWritten after BudgetExceededError index write for refactor safety. Also reserve slug fallback 'untitled' when slugifyTitle collapses the DAG title. Co-authored-by: Tony <tonyketcham@users.noreply.github.com>
Store per-run transcripts, _index.md, and _dag.json next to the workspace (<cwd>/.flatbread/artifacts/dag-<slug>-<timestamp>/) instead of ~/.cursor/... projects/<slug>/artifacts/, matching Flatbread's repo-local .flatbread convention. - Document canvas vs artifact paths in README; supervisor --full-output-dir pin - Align SKILL.md CLI tables and caveats - Ignore .flatbread/artifacts/ at repo root to avoid committing DAG dumps - Clarify --no-artifacts vs --findings-dir in run_dag.ts header comment Co-authored-by: Tony <tonyketcham@users.noreply.github.com>
…nscripts) Co-authored-by: Tony <tonyketcham@users.noreply.github.com>
Missed running pnpm lint:fix:fast after recent edits. Fixes lint CI failures on both node 20.x and 22.x runners. Change-Id: If241809977d0cc9373f2971f2735e5e4d2338c55
Change-Id: I9c89b8dee5dfe4ccaceb6287bb8c63b5efa3bfb4


Summary of changes
Proof now writes default DAG transcripts under
<cwd>/.flatbread/artifacts/(repo-local Flatbread convention), with--no-artifacts/--full-output-diroverrides unchanged in spirit.Recent commits
9d3a76c—fix(proof): resilient _index.md writes on success and shutdown paths— M-01 / M-02 / L-06: success-path.catchonwriteRunIndexMarkdown,failAndExitwrites best-effort_index.md,indexWrittenafter budget path,untitledslug fallback.573ced4—feat(proof): default transcript artifacts under .flatbread/artifacts— default dir under.flatbread/artifacts/; README + SKILL +run_dagheader updates; supervisor--full-output-dirnote in README.ac4ccae—chore: stop ignoring .flatbread/artifacts— removed root.gitignorerule so teams can commit Proof run artifacts if they want.Earlier doc commit
159ca69remains for PR review wording on flags.pnpm -F @flatbread/proof typecheck+buildpassed on touched TS changesDoes this introduce any non-backwards compatible changes?
~/.cursor/projects/.../artifacts/to<cwd>/.flatbread/artifacts/; use--full-output-dirto preserve old layout.Does this include any user config changes?