test(comply): regression guard for extractFailures attribution parity (#1708) by bokelley · Pull Request #1715 · adcontextprotocol/adcp-client

bokelley · 2026-05-12T14:49:41Z

Closes #1708.

Why this PR exists

The original #1708 framing — "cross-evaluator divergence (comply suite vs CLI runner) is a load-bearing higher-order bug" — got resolved differently than expected. The 27-point delta turned out to be version-driven, not evaluator-driven: the comply suite was on an older @adcp/sdk that hit two bugs the CLI runner's newer SDK didn't. Both bugs shipped in 7.1.0:

no_secret_echo invariant flags spec-legitimate field names on structured-value fields #1713 / PR fix(invariant): no_secret_echo only fails on string-valued suspect-named fields (#1713) #1714 — no_secret_echo invariant flagged the spec-legitimate authorization field regardless of value.
Harness error attribution: Zod validation rejects surface as failures on unrelated downstream assertions #1709 / PR fix(runner): attribute Zod schema rejects to response_schema (#1709) #1712 — Zod schema rejects were misattributed to whichever step-scope invariant fired next.

BidMachine retested on 7.1.0 (receipt) and the delta collapsed.

So #1708 doesn't need a "parity smoke test against a live agent" — it needs a regression guard for the aggregation layer so future refactors of extractFailures can't silently reintroduce the misattribution shape.

What's locked

extractFailures in comply.ts is what walks StoryboardResult[] and emits ComplianceFailure[] for the final report. Five invariants are now under test:

Schema-reject attribution preserved (Harness error attribution: Zod validation rejects surface as failures on unrelated downstream assertions #1709) — when a step has both a response_schema failure (prepended by the runner) and an assertion failure (a downstream invariant that ran), extractFailures surfaces validation.check === 'response_schema'. If a future refactor reorders or picks the wrong validation, this fails.
Skip markers filtered out (fix(runner): attribute Zod schema rejects to response_schema (#1709) #1712 invariant) — when the runner short-circuits step-scope invariants and emits { check: 'assertion', passed: true, description: '<id>: skipped — ...' } markers, extractFailures doesn't surface them. Only failed validations make the cut.
BidMachine-shape clean pass (no_secret_echo invariant flags spec-legitimate field names on structured-value fields #1713) — a step whose no_secret_echo invariant passed (structured authorization value, post-fix(invariant): no_secret_echo only fails on string-valued suspect-named fields (#1713) #1714 narrowing) and whose schema passed produces zero entries in failures. The aggregation must not synthesize spurious failures.
Multi-storyboard attribution — failed A, clean B, failed C produces exactly two failures entries with stable storyboard_id × validation.check tuples.
Skipped steps not counted — step.skipped: true entries don't surface as failures even if passed: false in some path.

API change

extractFailures is now export-ed from src/lib/testing/compliance/comply.ts (was file-internal). Visibility-only change; signature and behavior unchanged. The export lets the parity test call it directly with synthetic StoryboardResult fixtures, avoiding the need to spin up a mock HTTP MCP server.

Test plan

npm run build clean
npm run format:check clean
node --test test/lib/comply-vs-storyboard-parity.test.js — 7/7 tests pass across 5 describe blocks

Coordinated stance state after this PR


adcp-client #1703	merged (in 7.1.0)
adcp-client #1705	merged (in 7.1.0)
adcp-client #1706	merged (in 7.1.0)
adcp-client #1712 (#1709)	merged (in 7.1.0)
adcp-client #1714 (#1713)	merged (in 7.1.0)
adcp-client #1707	scope-corrected; parked for adopter demand
adcp-client #1708 (this PR)	closes after merge — regression guard for the above
adcp-client #1711	closed (BidMachine retest confirmed fix)

Part of the #1685 coordinated stance ("the SDK is a witness, not a translator").

🤖 Generated with Claude Code

…#1708) Locks the post-7.1.0 attribution invariants so future refactors of comply()'s extractFailures can't silently reintroduce the BidMachine misattribution shape (adcp#4419). What's locked: 1. A storyboard step carrying both a synthesized response_schema failure (prepended by the runner per #1709 / PR #1712) and an assertion entry surfaces validation.check === 'response_schema' in ComplianceResult.failures — never 'assertion'. The attribution that was silently broken pre-7.1.0 (Zod rejects fell through to the next invariant, canonically context.no_secret_echo). 2. Skipped-invariant markers (passed: true entries the runner emits when short-circuiting invariants downstream of a schema failure per #1712) are correctly filtered out — only failed validations surface in `failures`. A future change that included passed: true entries would crowd out the real failure. 3. A clean BidMachine-shape response (structured authorization field passing no_secret_echo per #1713 / PR #1714) produces zero failures through the aggregation layer. 4. Multi-storyboard aggregation preserves per-storyboard (storyboard_id, step_id, validation.check) tuples. 5/6. Clean pass paths (no failures, skipped steps) produce empty failures. API change (minor): extractFailures (previously file-internal) is now exported from src/lib/testing/compliance/comply.ts so the regression test can call it directly with synthetic StoryboardResult fixtures. Functionally identical; just visibility. Scope correction relative to the original #1708 framing: the "cross-evaluator divergence" symptom was version-driven (different @adcp/sdk versions hitting #1713 and #1709 differently), not a true parity gap. Both root causes shipped in 7.1.0; this test is the durable guard for the aggregation-layer invariants those fixes depend on. 7/7 tests pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

bokelley merged commit c566612 into main May 12, 2026
10 checks passed

bokelley deleted the bokelley/issue-1708-parity-guard branch May 12, 2026 15:20

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

test(comply): regression guard for extractFailures attribution parity (#1708)#1715

test(comply): regression guard for extractFailures attribution parity (#1708)#1715
bokelley merged 1 commit into
mainfrom
bokelley/issue-1708-parity-guard

bokelley commented May 12, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

bokelley commented May 12, 2026

Why this PR exists

What's locked

API change

Test plan

Coordinated stance state after this PR

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant