Skip to content

Proof output retention Phase 4: oracle evidence alignment #203

@tonyketcham

Description

@tonyketcham

Summary

Give oracle tasks the same explicit split between display tails and full execution evidence.

Source Context

Why This Exists

  • Phase 1 intentionally scopes execution-authoritative transcript behavior to kind: 'task'; the README/SKILL carve out oracle evidence as still tail-bounded.
  • Today oracle stdout/stderr are formatted into resultText using the display tail cap, with no full-evidence artifact equivalent to task stream transcripts.
  • Phase 4 aligns oracle output with the broader execution-vs-display model while preserving the existing ## Stdout (tail) and ## Stderr (tail) display shape.

Acceptance Criteria

  • Write ${taskId}.stdout.log and ${taskId}.stderr.log under the artifact directory for oracle tasks when artifacts are enabled.
  • Add a ## Evidence footer and canvas pointers for stdout/stderr paths without bloating STATE.tasks[].resultText.
  • Apply --max-in-memory-output-bytes to oracle full evidence in --no-artifacts mode, producing BUDGET-EXCEEDED on overflow.
  • Ensure downstream tasks depending on oracles use outputPolicy.upstream for long oracle evidence rather than silently seeing only the legacy tail.
  • Cover full stdout persistence, no-artifacts overflow, and downstream policy-bounded oracle context with tests.

Verification

  • pnpm -F @flatbread/proof test
  • pnpm -F @flatbread/proof typecheck
  • pnpm verify

Related Phase Issues

Artifact Linkage

This issue is linked back from docs/proposals/proof-output-retention-plan.md and, where relevant, docs/proposals/proof-output-retention-review.md so the repo artifacts and GitHub issues remain navigable in both directions.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or requesttestinge2e, unit, and integration testing

    Type

    No type
    No fields configured for issues without a type.

    Projects

    Status

    Todo

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions