Skip to content
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions SDLC.md
Original file line number Diff line number Diff line change
Expand Up @@ -127,6 +127,7 @@ Portable technical gotchas promoted from private memory via the Memory Audit Pro

- **Separate stderr from stdout when capturing output for JSON parsing.** `2>&1` mixes stderr into stdout, causing silent JSON parse failures that defaulted scores to 0. Use `2>"$err_file"` and check exit code separately. (Source: 2026-02-06 E2E silent-zero bug)
- **`continue-on-error: true` + `|| echo "fallback"` masks real failures.** Always audit these patterns for silent bugs — they convert step failures into green checks while hiding the underlying incident.
- **`/goal` evaluator does not verify enumerated-test-name fidelity in goal conditions.** When a `/goal` condition lists specific test cases by name (e.g. `nudge-fires-when-stale + silent-when-current + silent-when-offline`), the Haiku evaluator checks that *enough* tests exist but does not verify each named test exists by BEHAVIOR. PR #361 shipped 3 tests where Test C was silent-when-cache-poisoned rather than the goal-named silent-when-offline; evaluator approved completion. Caught only at post-merge self-review and fixed in PR #362 (added the real offline test via PATH-override fault injection on `npm`). Rule: when a `/goal` condition enumerates test cases, self-review must walk each named case and verify by reading the test's assertions, not by counting matching `test_` functions. This is the per-test-fidelity layer beneath PR #355's HIGH-95%-confidence and DLC-binding gates — those work at the macro level, but enumerated-condition fidelity is the author's responsibility. (Source: PR #361 + PR #362 incident, 2026-05-25)

### Evaluation & Benchmarking

Expand Down