Skip to content

evidence: add level 2 fixture 03 pair#20

Open
sjh9714 wants to merge 1 commit into
mainfrom
evidence/add-level-2-fixture-03-pair
Open

evidence: add level 2 fixture 03 pair#20
sjh9714 wants to merge 1 commit into
mainfrom
evidence/add-level-2-fixture-03-pair

Conversation

@sjh9714
Copy link
Copy Markdown
Owner

@sjh9714 sjh9714 commented May 24, 2026

Changed

  • Added one Level 2 explicit-prompt-delta paired comparison for 03-no-unrelated-refactor.
  • Added control and treatment standard-context patch files.
  • Updated the evidence run index.
  • Strengthened evidence validation so fixture 03 control/treatment patches apply to fresh fixtures, pass python3 -m unittest tests/test_pricing.py, satisfy score-diff thresholds, and stay within expected changed files.

Verified

  • Control fixture: python3 -m unittest tests/test_pricing.py
  • Control fixture: bash scripts/score-diff.sh --max-files 2 --max-lines 40
  • Treatment fixture: python3 -m unittest tests/test_pricing.py
  • Treatment fixture: bash scripts/score-diff.sh --max-files 2 --max-lines 40
  • bash tests/evidence-files.test.sh
  • Full local validation suite passed
  • git diff --check

Diff discipline

  • Files changed: 5
  • Files intentionally not touched: skill behavior, plugin content, installer behavior, benchmark fixture content, evidence schema, paired-comparison protocol, record-evidence.sh, score-diff.sh, README product claims, release tags
  • Why scoped: this PR only adds one fixture 03 explicit-prompt-delta pair and validates its artifacts.

Notes

  • No benchmark, productivity, endorsement, no-skill causal, or generalized improvement claims added.
  • This is one synthetic fixture pair, not proof of general improvement.
  • Pair subtype: explicit-prompt-delta
  • Codex version: codex-cli 0.133.0
  • Model: gpt-5.5
  • Failed or excluded attempts: one pre-session control command invocation failed before any Codex session because codex-cli 0.133.0 does not accept the legacy --ask-for-approval flag; no patch was produced and no verification was run.
  • No release created.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant