Skip to content

feat(proof): consistent default artifact output with --no-artifacts opt-out#187

Merged
tonyketcham merged 9 commits into
mainfrom
toeknee/proof-artifact-consistency-6c59
May 11, 2026
Merged

feat(proof): consistent default artifact output with --no-artifacts opt-out#187
tonyketcham merged 9 commits into
mainfrom
toeknee/proof-artifact-consistency-6c59

Conversation

@tonyketcham
Copy link
Copy Markdown
Collaborator

@tonyketcham tonyketcham commented May 10, 2026

Summary of changes

Proof now writes default DAG transcripts under <cwd>/.flatbread/artifacts/ (repo-local Flatbread convention), with --no-artifacts / --full-output-dir overrides unchanged in spirit.

Recent commits

  • 9d3a76cfix(proof): resilient _index.md writes on success and shutdown paths — M-01 / M-02 / L-06: success-path .catch on writeRunIndexMarkdown, failAndExit writes best-effort _index.md, indexWritten after budget path, untitled slug fallback.
  • 573ced4feat(proof): default transcript artifacts under .flatbread/artifacts — default dir under .flatbread/artifacts/; README + SKILL + run_dag header updates; supervisor --full-output-dir note in README.
  • ac4ccaechore: stop ignoring .flatbread/artifacts — removed root .gitignore rule so teams can commit Proof run artifacts if they want.

Earlier doc commit 159ca69 remains for PR review wording on flags.


  • pnpm -F @flatbread/proof typecheck + build passed on touched TS changes

Does this introduce any non-backwards compatible changes?

  • Yes — default artifact location moved from ~/.cursor/projects/.../artifacts/ to <cwd>/.flatbread/artifacts/; use --full-output-dir to preserve old layout.

Does this include any user config changes?

  • Yes — documented in SKILL.md and README.md
Open in Web Open in Cursor 

cursoragent and others added 2 commits May 10, 2026 22:35
- Per-task markdown transcripts, _index.md, and _dag.json are now
  written by default to a timestamped directory:
  ~/.cursor/projects/<workspace>/artifacts/dag-<title>-<ts>/
  This mirrors the canvas path scheme for discoverability.

- New --no-artifacts flag suppresses all artifact output. Canvas is
  still written. --full-output-dir still overrides the path.

- Oracle and pause tasks now write markdown files. Previously only
  kind: task tasks produced .md files; oracle/pause output was only
  visible in the canvas. Now their resultText (pass/fail for oracle,
  approval status for pause) is persisted alongside regular task
  transcripts.

- Enhanced _index.md: includes a link to _dag.json, run start/finish
  timestamps, the overall outcome, and a Kind column (task/oracle/pause).

- _dag.json is written to the artifact directory at run start so the
  directory is fully self-contained and relational.

- writeRunIndexMarkdown updated to accept runMeta (startedAt,
  finishedAt, runOutcome, runMessage).

- SKILL.md CLI knobs and CLI options tables updated with --full-output-dir,
  --no-artifacts, and --findings-dir entries.

- README.md gets a new Artifact Output section documenting the default
  behavior, --no-artifacts, and --full-output-dir.

Co-authored-by: Tony <tonyketcham@users.noreply.github.com>
H1 — _index.md now written on all exit paths (not just success):
- Success path: unchanged (written, indexWritten flag set)
- BudgetExceededError: write _index.md before process.exit() since
  finally block is bypassed by process.exit()
- Generic error / defensive paths: best-effort write in finally block
  when !indexWritten, with .catch(() => {}) to avoid masking the
  original error

L2 — persistTaskMarkdownFile errors no longer abort the run:
- All call sites in runOne (oracle/pause paths) now .catch() and log
  a [proof] warning instead of propagating to Promise.all
- runTask finally block wrapped similarly
- skipTask: converted to async, wrapped in .catch()

M2 — dag.title / state.title inconsistency resolved:
- Added dagTitle?: string to RunTaskOptions
- baseRunOptions now includes dagTitle: dag.title
- runTask uses options.dagTitle ?? state.title for artifact writes
- skipTask accepts an optional dagTitle parameter; call site passes dag.title
- Oracle/pause call sites already used dag.title (no change needed)

SKILL.md caveats: document supervisor restart creates new artifact dir,
resume creates incomplete dir, CI/headless --no-artifacts recommendation.
SKILL.md step 6: split into three sub-bullets accurately reflecting the
three write phases (_dag.json at start, task .md as they finish, _index.md
at all run ends).

Co-authored-by: Tony <tonyketcham@users.noreply.github.com>
Copy link
Copy Markdown

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Stale comment

Review verdict

REQUEST_CHANGES — two independent consensus HIGH findings (.cursor/skills/proof/SKILL.md:151-162 flagged by all five perspectives; .cursor/skills/proof/SKILL.md:233-235 flagged by two perspectives as HIGH) trigger this rule. The coverage plan is also non-empty against modified non-test source.


Chunk-bound feedback

.cursor/skills/proof/SKILL.md:151-162 — HIGH

  • flagged-by: correctness-and-contracts, test-coverage-robustness, proof-runtime-internals, docs-and-positioning, dx-and-examples
  • risk: The --no-artifacts row says "Skip all artifact file output," but --findings-dir JSON sidecars travel through a completely separate code path (findingsAbsoluteDir, lines 576–581 of run_dag.ts) that is never gated by fullOutputAbsoluteDir. A user who reads this row and passes --no-artifacts to get clean-room output will still find JSON sidecars written if --findings-dir is active.
  • minimal fix: Revise the Purpose cell to: "Suppresses per-task transcripts, _index.md, and _dag.json; does not suppress --findings-dir JSON sidecars."

.cursor/skills/proof/SKILL.md:233-235 — HIGH

  • flagged-by: docs-and-positioning, dx-and-examples
  • risk: The note correctly states "each restart creates a fresh timestamped artifact directory," but omits the critical corollary: that the supervisor forwards the full argv to each child (minus only --max-runner-restarts, per stripSupervisorOnlyFlags). Without pinning --full-output-dir on the supervisor invocation, every restart silently scatters artifacts across distinct timestamped directories. Users who depend on post-mortem artifact recovery from a supervisor run will find a fragmented history with no guidance on how to avoid it.
  • minimal fix: Rewrite to: "Each new runner process receives its own default timestamped artifacts directory. Pin --full-output-dir on the supervisor invocation so every restart child inherits a stable path."

.cursor/skills/proof/SKILL.md:143-147 — MED

  • flagged-by: docs-and-positioning (LOW), dx-and-examples (MED)
  • risk: "At run end (all exit paths including errors)" creates a contractual guarantee. In practice, failAndExit and the finally-block calls both use .catch(() => {}) with no log trace; a write error during SIGTERM teardown is silently dropped. A user relying on post-mortem artifact recovery would have no signal that the write failed.
  • minimal fix: Qualify to: "best-effort on all exit paths; write failures are logged as warnings."

.cursor/skills/proof/SKILL.md:210-229 — MED

  • flagged-by: docs-and-positioning, dx-and-examples
  • risk: The full flag reference table at lines 210–229 defines --no-artifacts as "Skip writing per-task transcripts, _index.md, _dag.json" while the abbreviated table at 151–162 says "Skip all artifact file output." Before the fix to 151–162, the two rows are internally contradictory; after the fix they must both carry the identical qualifier about --findings-dir sidecars.
  • minimal fix: Apply the identical --findings-dir scoping note to the --no-artifacts row in this table so both definitions are byte-identical in meaning.

packages/proof/README.md:76-101 — MED

  • flagged-by: docs-and-positioning, dx-and-examples
  • risk: "By default, every run writes…" is incorrect for --init-only (exits before artifact setup) and --dry-check-cmds (bypasses the artifact path entirely). Additionally, no migration note exists for users who relied on the prior behavior where artifacts only appeared when --full-output-dir was explicitly passed; upgraders will be surprised by new disk writes under ~/.cursor/.
  • minimal fix: Narrow the lede to "By default, every full DAG run writes…" with a parenthetical (not --init-only or --dry-check-cmds); add one sentence: "Previously artifacts required passing --full-output-dir; use --no-artifacts to restore the old behavior."

Consensus findings

  • .cursor/skills/proof/SKILL.md:151-162--no-artifacts scope misstatement: flagged as HIGH by all five perspectives.
  • .cursor/skills/proof/SKILL.md:233-235 — Supervisor argv forwarding mechanics misstated: flagged by docs-and-positioning (HIGH) and dx-and-examples (MED); both agree the text is factually misleading.
  • packages/proof/README.md:76-101 — Missing --init-only/--dry-check-cmds exceptions and upgrade migration note: flagged by docs-and-positioning (MED) and dx-and-examples (MED).
  • .cursor/skills/proof/SKILL.md:143-147 — Overstated "all exit paths" guarantee vs. silent .catch(() => {}): docs-and-positioning (LOW) and dx-and-examples (MED) — dx wins (MED), this is a user-facing contract.

Coverage plan

No test files were added in this PR. The following cases are missing from packages/proof/src/__tests__/run_dag.test.ts:

  1. positive--no-artifacts alone: fullOutputAbsoluteDir is undefined, no artifact directory is created, _dag.json/per-task .md/_index.md are never written.
  2. negative--no-artifacts + --full-output-dir <path> combined: a warning is emitted; no artifact files are written.
  3. positive — default run (no flags): artifact directory matches ~/.cursor/projects/<slug>/artifacts/dag-<title>-<ts>/; _dag.json written before first task; per-task .md files written after each task; _index.md written at run end.
  4. positive--full-output-dir <explicit-path>: artifacts land at the specified path, not the default computed path.
  5. edge:signal — SIGTERM during run: _index.md is written before process.exit is called.
  6. edge:exit-75maybeRestartAfterRunnerChange: _index.md is written with runOutcome: 'RESTARTING_RUNNER' before exit.
  7. edge:budget-exceededBudgetExceededError path: _index.md is written exactly once (indexWritten guard prevents a double write).
  8. positivewriteRunIndexMarkdown output contains all runMeta fields and a [_dag.json](./_dag.json) back-link.
  9. edge:FS-errormkdir for artifact dir fails (permission denied): runner does not crash; DAG executes; canvas is written; error surfaces as a warning.
  10. edge:initOnly--init-only: no artifact directory is created; canvas is written and process exits.
  11. edge:dryCheckCmds--dry-check-cmds: no artifact directory is created; process exits before artifact setup.
  12. edge:emptySlugdefaultArtifactsDir when slugifyTitle('') returns '': path avoids a double-dash segment or a descriptive error is thrown.
  13. negative — partial failure (one task errors): _index.md is still written at run end with the failed task row present.

Suggested follow-ups

  • --no-artifacts + --findings-dir interaction should be cross-documented. The two flags operate on independent output paths; a user wanting fully clean output needs to pass both. Neither the README nor SKILL.md currently says so.
  • Harden silent catches in artifact write paths. failAndExit and the finally block use .catch(() => {}) with no logging. Upgrade to .catch((e) => console.warn('[proof] _index.md write failed:', e.message)) so ops teams have signal when post-mortem artifacts are silently dropped.
  • Evaluate runtime FS permissions. Whether ~/.cursor/projects/<slug>/artifacts/ is writable in sandboxed CI, Docker containers with read-only home mounts, or Windows paths is residual risk. Add a known-environments matrix or a startup permission check with a clear error message.

Reviewer scoreboard

  • correctness-and-contracts: 10 coverage gaps, 0 direct findings, signal: MED — accurate, well-specified coverage gaps; no code-level findings.
  • test-coverage-robustness: 12 coverage gaps, 0 direct findings, signal: MED — thorough case enumeration; good FS-error and empty-DAG additions.
  • proof-runtime-internals: 11 coverage gaps, 0 direct findings, signal: MED — uniquely surfaced --init-only/--dry-check-cmds exclusions; heavy overlap with other coverage perspectives.
  • docs-and-positioning: 4 findings (2 HIGH, 1 MED, 1 LOW), 0 coverage gaps, signal: HIGH — highest precision; SKILL.md:233-235 supervisor-forwarding finding was unique and actionable; correctly identified the --findings-dir scoping error.
  • dx-and-examples: 5 findings (2 HIGH, 2 MED, 1 LOW), 0 coverage gaps, signal: HIGH — strong overlap with docs-and-positioning; added the upgrade-migration note for README:76-101.
Open in Web View Automation 

Sent by Cursor Automation: Flatbread PR Review

Comment thread .cursor/skills/proof/SKILL.md Outdated
Comment thread .cursor/skills/proof/SKILL.md Outdated
Comment thread .cursor/skills/proof/SKILL.md Outdated
Comment thread .cursor/skills/proof/SKILL.md Outdated
Comment thread packages/proof/README.md Outdated
…ilures

- SKILL: clarify --no-artifacts does not suppress --findings-dir JSON sidecars
  (abbreviated + full CLI tables aligned).
- SKILL: qualify artifact lifecycle as init-only/dry-check exempt and index
  writes as best-effort with logged warnings.
- SKILL: fix supervisor caveat — argv is forwarded; pin --full-output-dir on
  the supervisor for stable paths across restarts.
- README: narrow default artifact wording to full DAG runs; add upgrade note
  and cross-document --findings-dir vs --no-artifacts.
- run_dag: replace silent _index.md write catches with console.warn so failed
  teardown writes surface in logs.

Co-authored-by: Tony <tonyketcham@users.noreply.github.com>
Copy link
Copy Markdown

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review verdict

REQUEST_CHANGES — one new MED correctness gap and a non-empty coverage plan against a documented public contract.


Prior-round findings: all addressed

All five threads from the previous run are resolved in this push. Specifically:

  • --no-artifacts scope clarified in both abbreviated and full CLI tables (was HIGH consensus across all reviewers)
  • Supervisor argv-forwarding caveat added with explicit --full-output-dir pin guidance (was HIGH)
  • SKILL.md step-6 bullet qualified to "best-effort _index.md… write failures are logged as warnings" (was MED — partially addressed; see new finding below)
  • README lede narrowed to "full DAG runs" + upgrade migration note added (was MED)
  • Internal table contradiction resolved (was MED)

Chunk-bound feedback

packages/proof/src/run_dag.ts:913-925 — MED

  • flagged-by: proof-runtime-internals, correctness-and-contracts
  • risk: writeRunIndexMarkdown in the success path has no .catch(). If the write throws after all tasks complete (disk full, permission change, race), the uncaught error propagates to the outer catch block, which calls markRunTerminated(…, 'FAILED') and reschedules the canvas — incorrectly showing the run as FAILED even though all tasks finished. indexWritten stays false, so the finally block then attempts a second write with the now-stale runOutcome = 'FAILED'. This directly contradicts the SKILL.md "best-effort… write failures are logged as warnings rather than crashing the runner" guarantee introduced by this same PR. The BudgetExceededError and generic-error paths were correctly wrapped with .catch() in this push; the success path was missed.
  • minimal fix:
await writeRunIndexMarkdown(
  fullOutputAbsoluteDir,
  dag.title,
  state.tasks,
  { startedAt: state.startedAt, finishedAt: state.finishedAt,
    runOutcome: state.runOutcome, runMessage: state.runMessage }
).catch((e: unknown) => {
  console.warn(`[proof] _index.md write failed: ${e instanceof Error ? e.message : String(e)}`);
});
indexWritten = true; // set unconditionally to prevent retry with stale FAILED state

Coverage plan

No test files were added or modified in this PR. The following cases are still absent from packages/proof/src/__tests__/run_dag.test.ts (carried over from the prior review; none implemented):

  1. packages/proof/src/__tests__/run_dag.test.tspositive--no-artifacts: fullOutputAbsoluteDir resolves to undefined; no artifact directory is created; no _dag.json, per-task .md, or _index.md is written.
  2. packages/proof/src/__tests__/run_dag.test.tsnegative--no-artifacts + --full-output-dir <path> combined: --full-output-dir is silently ignored; no artifacts written; the behavior is deterministic.
  3. packages/proof/src/__tests__/run_dag.test.tspositive — default run (no flags): artifact directory path matches ~/.cursor/projects/<slug>/artifacts/dag-<title>-<ts>/; _dag.json present before first task starts; per-task .md written after each task; _index.md written at run end with correct runOutcome.
  4. packages/proof/src/__tests__/run_dag.test.tspositive--full-output-dir <explicit-path>: artifacts land at the specified path, not the default computed path.
  5. packages/proof/src/__tests__/run_dag.test.tsedge:FS-error_index.md write in success path throws (e.g. directory removed mid-run): runner does not mark the canvas as FAILED; a [proof] _index.md write failed warning is logged; indexWritten is true so no retry.
  6. packages/proof/src/__tests__/run_dag.test.tsedge:budget-exceededBudgetExceededError path: _index.md is written exactly once (indexWritten guard prevents double write).
  7. packages/proof/src/__tests__/run_dag.test.tsedge:emptySlugslugifyTitle('') returns '': default artifact path avoids a double-dash segment or throws a descriptive error.
  8. packages/proof/src/__tests__/run_dag.test.tsedge:initOnly--init-only: no artifact directory is created; canvas is written; process exits.
  9. packages/proof/src/__tests__/run_dag.test.tsedge:dryCheckCmds--dry-check-cmds: no artifact directory is created; process exits before artifact setup.
  10. packages/proof/src/__tests__/run_dag.test.tspositivewriteRunIndexMarkdown output includes [_dag.json](./_dag.json) back-link and all runMeta fields.

Suggested follow-ups

  • --no-artifacts + --full-output-dir conflict is silent. A warning when both flags are supplied (console.warn('[proof] --full-output-dir ignored because --no-artifacts is set')) would prevent confusion; currently --no-artifacts silently wins with no diagnostic.
  • _dag.json write at startup is not best-effort. Unlike per-task writes (which have .catch()), the initial mkdir + _dag.json write at startup propagates errors into the outer catch block and aborts the run before any tasks start. The docs do not claim best-effort for this path, which is internally consistent — but it means a permission issue on ~/.cursor/projects/<slug>/artifacts/ silently kills the run with no guidance. Consider either a best-effort guard or a clearer upfront diagnostic.

Reviewer scoreboard

  • correctness-and-contracts: 1 finding (MED), 3 coverage gaps, signal: MED — correctly identified the .catch() asymmetry between success and error paths.
  • proof-runtime-internals: 1 finding (MED), 4 coverage gaps, signal: HIGH — uniquely traced the canvas-corruption consequence of the missing .catch() in the success path.
  • test-coverage-robustness: 0 findings, 10 coverage gaps, signal: MED — thorough coverage enumeration; accurate.
  • docs-and-positioning: 0 findings (all prior doc issues fully resolved), 0 coverage gaps, signal: MED — correctly confirmed prior thread resolutions.
  • dx-and-examples: 0 findings, 0 coverage gaps, signal: MED — confirmed all prior example/README issues resolved.
Open in Web View Automation 

Sent by Cursor Automation: Flatbread PR Review

runMessage: state.runMessage,
}
);
indexWritten = true;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[MED — new finding] The writeRunIndexMarkdown call above this line has no .catch(). If the write fails after all tasks complete (disk full, mid-run permission change), the error propagates to the outer catch block, which calls markRunTerminated(…, 'FAILED') and overwrites the canvas with a spurious FAILED state — even though every task finished successfully. indexWritten stays false, so the finally block then retries with the now-stale runOutcome = 'FAILED'.

The BudgetExceededError and generic-error paths in this same PR correctly wrap the call with .catch((e) => console.warn(...)). The success path was missed, leaving the SKILL.md "best-effort… write failures are logged as warnings" guarantee unfulfilled for the happy path.

Minimal fix:

await writeRunIndexMarkdown(
  fullOutputAbsoluteDir,
  dag.title,
  state.tasks,
  { startedAt: state.startedAt, finishedAt: state.finishedAt,
    runOutcome: state.runOutcome, runMessage: state.runMessage }
).catch((e: unknown) => {
  console.warn(`[proof] _index.md write failed: ${e instanceof Error ? e.message : String(e)}`);
});
indexWritten = true; // set unconditionally — prevents retry with stale FAILED state

@tonyketcham tonyketcham marked this pull request as ready for review May 11, 2026 02:31
cursoragent and others added 3 commits May 11, 2026 02:33
- Wrap success-path writeRunIndexMarkdown in .catch + console.warn so FS
  failures do not fall through to generic FAILED handling / misleading finally.
- Write best-effort _index.md inside failAndExit after canvas flush so SIGINT/
  SIGTERM/SIGHUP and uncaught-exception exits match BudgetExceeded symmetry.
- Set indexWritten after BudgetExceededError index write for refactor safety.

Also reserve slug fallback 'untitled' when slugifyTitle collapses the DAG title.

Co-authored-by: Tony <tonyketcham@users.noreply.github.com>
Store per-run transcripts, _index.md, and _dag.json next to the workspace
(<cwd>/.flatbread/artifacts/dag-<slug>-<timestamp>/) instead of ~/.cursor/...
projects/<slug>/artifacts/, matching Flatbread's repo-local .flatbread convention.

- Document canvas vs artifact paths in README; supervisor --full-output-dir pin
- Align SKILL.md CLI tables and caveats
- Ignore .flatbread/artifacts/ at repo root to avoid committing DAG dumps
- Clarify --no-artifacts vs --findings-dir in run_dag.ts header comment

Co-authored-by: Tony <tonyketcham@users.noreply.github.com>
…nscripts)

Co-authored-by: Tony <tonyketcham@users.noreply.github.com>
@tonyketcham tonyketcham enabled auto-merge (rebase) May 11, 2026 02:37
tonyketcham and others added 3 commits May 10, 2026 19:41
Missed running pnpm lint:fix:fast after recent edits. Fixes lint CI
failures on both node 20.x and 22.x runners.

Change-Id: If241809977d0cc9373f2971f2735e5e4d2338c55
Change-Id: I9c89b8dee5dfe4ccaceb6287bb8c63b5efa3bfb4
@tonyketcham tonyketcham disabled auto-merge May 11, 2026 03:05
@tonyketcham tonyketcham merged commit ab20d86 into main May 11, 2026
19 checks passed
@tonyketcham tonyketcham deleted the toeknee/proof-artifact-consistency-6c59 branch May 11, 2026 03:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants