Skip to content

fix: scope external schema compliance bundles#2107

Open
bokelley wants to merge 2 commits into
mainfrom
schema-compliance-apis
Open

fix: scope external schema compliance bundles#2107
bokelley wants to merge 2 commits into
mainfrom
schema-compliance-apis

Conversation

@bokelley
Copy link
Copy Markdown
Contributor

Summary

Fixes external compliance/schema bundle handling by scoping schema roots per run, resolving async response schema refs without response-validator prewarm, and adding hosted stable-line alias support.
It exposes schema-root/version options through compliance, storyboard, and conformance-fuzzing APIs and CLIs, while preserving stable wire versions against prerelease validator caches.
It adds regression coverage for external schema roots, hosted aliases, async response refs, fuzzer schema overrides, and fix-command reproduction flags.

Validation

Ran npm run build:lib, focused node test suites for storyboard/schema/conformance coverage, node --check for both CLIs, Prettier checks on touched TypeScript files, commitlint, and the pre-push typecheck/build hook.

@bokelley bokelley marked this pull request as ready for review May 29, 2026 21:17
Comment thread bin/adcp.js Fixed
aao-ipr-bot[bot]
aao-ipr-bot Bot previously approved these changes May 29, 2026
Copy link
Copy Markdown

@aao-ipr-bot aao-ipr-bot Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approving on the strength of the AsyncLocalStorage swap plus the converging expert reads. The architectural move is right: process-global registerExternalSchemaRoot was the wrong shape for a service that runs concurrent compliance jobs, and scoping the schema root to the async chain via withExternalSchemaRoot fail-closes at the seam the way it should. wireAdcpVersion is plumbed cleanly through all five ProtocolClient.callTool sites in TaskExecutor.ts plus SingleAgentClient.ts:463, and buildVersionEnvelopeForMode(mode, wireAdcpVersion ?? adcpVersion, serverVersion) at src/lib/protocols/index.ts:305 keeps validation pinned to adcpVersion while the envelope advertises the stable line. Witness behavior intact.

Things I checked

  • All callers of loadComplianceIndex reach the validator inside a withExternalSchemaRoot wrap. comply.ts:1081 wraps complyImpl's body; runner.ts:950,3240 wrap runStoryboardBody / runStoryboardStepBody; the CLI path bin/adcp.js:2493 computes fileSchemaRoot before handing off to runStoryboard. The read-only callers (storyboard show, the four call sites in compliance.ts resolving bundles) don't compile validators, so they don't need the scope.
  • inferWireAdcpVersion at comply.ts:683-695 is gated five ways — caller didn't pin, alias supplied, profile has supported_versions, compliance version isn't naturally supported, alias is supported. The hosted stable-line test at storyboard-version-negotiation.test.js:284-303 exercises the right path: adcpVersion stays at CURRENT_BETA_VERSION, wireAdcpVersion lands at '3.1', _serverAdcpVersion lands at '3.1'.
  • assertSchemaRootMatchesVersion tightening (schema-loader.ts:376-385) flips the contract from "at least the expected bundle" to "exactly the expected bundle." Internal callers register single-bundle roots and the new tests catch the mixed-bundle case at storyboard-version-negotiation.test.js:535-541. No internal site was relying on the prior size === 0 tolerance.
  • ensureCoreLoaded change at schema-loader.ts:603-647 pre-registers async response variants with relaxResponseRoot applied, and getValidator now triggers ensureCoreLoaded on direction === 'request' (schema-loader.ts:677). That removes the prewarm requirement for comply_test_controller's core/async-response-data.json ref. The new test at storyboard-version-negotiation.test.js:485-510 is load-bearing — it would have failed before this change.
  • stateCacheKey(bundleKey, root) uses \0 separator. Node's fs.* rejects paths with null bytes, and resolveBundleKey's regex disallows them in bundle keys, so the keyspace can't collide. Same treatment in oracle.ts (${tool}\0${schemaRoot}\0${version}) and conformance/schemaLoader.ts.
  • Symlink risk on walkJsonFiles (schema-loader.ts:432): Dirent from readdirSync({withFileTypes:true}) reports isDirectory() / isFile() based on lstat, so symlinks are skipped entirely. A symlink to /etc/shadow doesn't get parsed and doesn't leak into error messages.
  • Changeset present (.changeset/scoped-schema-compliance-bundles.md). CLI flag plumbing on bin/adcp.js + bin/adcp-fuzz.js is consistent; fix-command reproduction at comply.ts:1004-1010 preserves both --schema-root and --hosted-stable-line-alias and the test at storyboard-version-negotiation.test.js:541-566 verifies shell-quoted paths round-trip.

Follow-ups (non-blocking — file as issues)

  • states map never reclaims scoped entries. withExternalSchemaRoot at schema-loader.ts:418-430 writes states.set(stateCacheKey(key, root), state) via ensureInit on entry but doesn't delete on exit. Long-running hosted services running compliance jobs against per-tenant temp directories (/tmp/compliance-${nonce}/...) leak one AJV instance + compiled validators + file index per unique path. Bounded by distinct schema-root strings over process lifetime, not request count, so the CLI is unaffected — but the hosted runner wants either a try/finally deleting the entry on scope exit (cost: recompile per call) or a bounded LRU on states.
  • Hosted-alias semantics let beta-only response fields pass under a stable-line badge claim. When wireAdcpVersion is downgraded to '3.1' against a 3.1.0-beta.7 cache, the beta response validator accepts any beta-only fields the seller emits. Per ad-tech-protocol-expert's read, the wire adcp_version is normatively a requester pin (per core/version-envelope.json) — so emitting '3.1' while accepting beta-shape is coherent for the buyer. But for a badge asserting GA conformance, this masks spec drift. File an issue on adcontextprotocol/adcp to clarify whether the request envelope's adcp_version is a pin or a hint, and add a strictWireShape mode that pins to GA validators when the alias is engaged.
  • schemaRoot is operator-trusted, not tenant-trusted. The CLI flag accepts arbitrary paths and the validator walks every .json under them. No content leaks via error messages (parse failures are swallowed in hasSchemaRootShape / collectSchemaRootVersionKeys) but a hosted runner that pipes tenant input straight into ComplyOptions.schemaRoot lets a tenant probe filesystem reachability and trigger reads of any JSON the SDK process can see. Add a callout in docs/guides/BUILD-AN-AGENT.md or a new docs/development/SCHEMA-ROOT-SAFETY.md that operators must allow-list schemaRoot at the host boundary before plumbing it through.
  • Changeset wording undersells the fix. The prior process-global registration was a real soundness bug for concurrent hosted runs — tenant A's schemaRoot could shadow tenant B's validation, potentially flipping a compliance verdict. Recommend amending the changeset to mention "concurrent compliance runs could share schema bundles via process-global state" so hosted operators know to take the patch. Patch-level bump is also load-bearing here for a different reason: this PR adds new public exports (withExternalSchemaRoot, getExternalSchemaRootForCompliance, wireAdcpVersion on four interfaces, hostedStableLineAlias on ResolveOptions/ComplyOptions). Strictly minor-bump material; the bugfix framing carries it but a follow-up changeset that tightens prose would be honest.

Minor nits (non-blocking)

  1. detectSchemaVersion derives version from path basename. src/lib/conformance/schemaLoader.ts:122-128 strips a trailing /bundled then takes path.basename. If a caller passes /tmp/x/bundled, report.schemaVersion becomes x and the fuzzer echoes it into --schema-version for reproduceCommand at bin/adcp-fuzz.js:367. Cosmetic — options.version wins when both are set — but the reproduction line can look like nonsense. Prefer the version-from-basename only when the basename is version-shaped.
  2. unregisterExternalSchemaRoot still exported. It's no longer used by the internal compliance/storyboard paths (they went to withExternalSchemaRoot). Test code still calls it. Keep the export for adopters who registered globals before this PR — but a deprecation note in the JSDoc pointing at withExternalSchemaRoot would help guide them to the scoped API.

Ship it once CI validates the new schema-root tests.

Comment thread bin/adcp.js
async function handleStoryboardStepCmd(args) {
const { getComplianceStoryboardById, runStoryboardStep } = await import('../dist/lib/testing/storyboard/index.js');
const { authToken, authScheme, protocolFlag, jsonOutput, debug, positionalArgs } = parseAgentOptions(args);
const { authToken, authScheme, protocolFlag, jsonOutput, debug, positionalArgs, complianceVersion, schemaRoot } =
Copy link
Copy Markdown

@aao-ipr-bot aao-ipr-bot Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approving. Scoping schema roots via AsyncLocalStorage is the right shape — concurrent compliance runs against different bundles stop poisoning each other, and the per-call withExternalSchemaRoot keeps the global register path available for callers that still want it.

Things I checked

  • withExternalSchemaRoot precedence in src/lib/validation/schema-loader.ts:234-240 — scoped > global > dist > cache. Concurrent register-only and scoped callers can't poison each other because getStore() returns undefined outside the ALS chain.
  • states Map rekeyed to bundleKey\0root (stateCacheKey at L325) — two roots for the same bundle key now coexist; clearStatesForBundle walks correctly when register swaps a root.
  • ensureCoreLoaded now also pre-registers async response variants with relaxResponseRoot (L617-633). Closes the comply_test_controllercore/async-response-data.json → async-variant chain so request validators compile without prewarming the response side. getValidator correctly triggers ensureCoreLoaded for direction === 'request' (L674).
  • hostedStableLineAlias semantics in src/lib/testing/storyboard/compliance.ts:657-704. Regex pair handles "3" (no alias), "3.10", "v3.1.0-rc.1", "3.1.0+build" (excluded by isPrereleaseVersion — correct semver call). Bridge only fires for prerelease caches + matching stable line + alias listed in supported_versions.
  • wireAdcpVersion propagation: SingleAgentClientConfigTaskExecutor.configProtocolClient.callTool (wireAdcpVersion ?? adcpVersion at src/lib/protocols/index.ts:305-309). Schema validation stays pinned to adcpVersion; only the wire envelope shifts.
  • Changeset present, patch. Side-effect removal from loadComplianceIndex is bug-shaped — patch is defensible as restoring documented intent.
  • Test plan in PR body checks out — focused suites, node --check on both CLIs, prettier, commitlint, pre-push hook. Three new tests cover the hosted-alias bridge, external-root async-ref resolution, and fix-command preservation.
  • security-reviewer: sound. No multi-tenant contamination via compiledValidators or states. CLI flags are build/test-time only; no untrusted reachability.
  • ad-tech-protocol-expert: sound-with-caveats. Hosted stable-line bridge is defensible for the AAO 3.1 case where sellers advertise only the GA line until ratified. The witness-vs-translator surface is sharpened (beta-only invariants applied to "3.1" envelopes) but the certifier owns the cache choice.

Follow-ups (non-blocking — file as issues)

  • cachedAjv is process-global in src/lib/conformance/oracle.ts:58-65. compiledValidators is correctly keyed per (tool, schemaRoot, version), but every compile shares the one AJV instance. Two runConformance calls in the same process against different schemaRoots whose bundled schemas carry the same $id (the realistic case — $ids are spec-determined, not root-determined) will throw schema with key or id "..." already exists on the second compile. The new test at test/lib/conformance-integration.test.js:96 writes schemas without $id, so the regression path is uncovered. Fix shape: per-(root, version) AJV instances mirroring the states Map pattern in schema-loader.ts, or getAjv().removeSchema($id) before recompile.
  • InProcessAgentClientConfig omits wireAdcpVersion (src/lib/core/AgentClient.ts:167-181). Picks 'adcpVersion' and 'versionEnvelope' but not the new field — in-process callers can't set the wire override. Probably moot for the hosted-certifier use case but worth closing for API symmetry.
  • states.delete(stateCacheKey(...)) runs BEFORE scopedExternalSchemaRoots.run (schema-loader.ts:428). Two concurrent withExternalSchemaRoot(sameKey, sameRoot) calls evict each other's cached LoaderState. Closures hold the ref so in-flight validations are safe, but each parent re-buildFileIndex's and recompiles. Move the eviction inside run(), or skip when the existing cached state already matches the root.
  • assertSchemaRootMatchesVersion tightened to keys.size === 1 && keys.has(expectedKey) (L376-385). The old keys.size === 0 || keys.has(expectedKey) accepted mixed bundles and empty-$id roots; the new fence rejects both. Tighter is correct — a 3.0 schema sitting next to 3.1 in one root was the actual loophole. Worth a one-line note in the changeset body so adopters maintaining custom mirror layouts see it.

Minor nits (non-blocking)

  1. Help-text alias coverage. bin/adcp.js:1745-1754 documents --schema-root and --validator-source as aliases; bin/adcp-fuzz.js:22-28 documents both pairs. The handleStoryboardShow flag-stripping at line 2197 compares against the resolved value, not the alias spelling — works because parseComplianceSelection reads either. Worth a JSDoc on parseComplianceSelection to make the alias contract explicit.
  2. Unbounded growth on responseSchemaHasContext / compiledValidators. Caches keyed by (tool, root, version) with no eviction. Bounded by distinct root paths in practice — fine for short-lived processes, flag if a long-lived hosted service iterates many roots.

LGTM. Follow-ups noted below.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant