fix: scope external schema compliance bundles#2107
Conversation
There was a problem hiding this comment.
Approving on the strength of the AsyncLocalStorage swap plus the converging expert reads. The architectural move is right: process-global registerExternalSchemaRoot was the wrong shape for a service that runs concurrent compliance jobs, and scoping the schema root to the async chain via withExternalSchemaRoot fail-closes at the seam the way it should. wireAdcpVersion is plumbed cleanly through all five ProtocolClient.callTool sites in TaskExecutor.ts plus SingleAgentClient.ts:463, and buildVersionEnvelopeForMode(mode, wireAdcpVersion ?? adcpVersion, serverVersion) at src/lib/protocols/index.ts:305 keeps validation pinned to adcpVersion while the envelope advertises the stable line. Witness behavior intact.
Things I checked
- All callers of
loadComplianceIndexreach the validator inside awithExternalSchemaRootwrap.comply.ts:1081wrapscomplyImpl's body;runner.ts:950,3240wraprunStoryboardBody/runStoryboardStepBody; the CLI pathbin/adcp.js:2493computesfileSchemaRootbefore handing off torunStoryboard. The read-only callers (storyboard show, the four call sites incompliance.tsresolving bundles) don't compile validators, so they don't need the scope. inferWireAdcpVersionatcomply.ts:683-695is gated five ways — caller didn't pin, alias supplied, profile has supported_versions, compliance version isn't naturally supported, alias is supported. The hosted stable-line test atstoryboard-version-negotiation.test.js:284-303exercises the right path:adcpVersionstays atCURRENT_BETA_VERSION,wireAdcpVersionlands at'3.1',_serverAdcpVersionlands at'3.1'.assertSchemaRootMatchesVersiontightening (schema-loader.ts:376-385) flips the contract from "at least the expected bundle" to "exactly the expected bundle." Internal callers register single-bundle roots and the new tests catch the mixed-bundle case atstoryboard-version-negotiation.test.js:535-541. No internal site was relying on the priorsize === 0tolerance.ensureCoreLoadedchange atschema-loader.ts:603-647pre-registers async response variants withrelaxResponseRootapplied, andgetValidatornow triggersensureCoreLoadedondirection === 'request'(schema-loader.ts:677). That removes the prewarm requirement forcomply_test_controller'score/async-response-data.jsonref. The new test atstoryboard-version-negotiation.test.js:485-510is load-bearing — it would have failed before this change.stateCacheKey(bundleKey, root)uses\0separator. Node'sfs.*rejects paths with null bytes, andresolveBundleKey's regex disallows them in bundle keys, so the keyspace can't collide. Same treatment inoracle.ts(${tool}\0${schemaRoot}\0${version}) andconformance/schemaLoader.ts.- Symlink risk on
walkJsonFiles(schema-loader.ts:432):DirentfromreaddirSync({withFileTypes:true})reportsisDirectory()/isFile()based onlstat, so symlinks are skipped entirely. A symlink to/etc/shadowdoesn't get parsed and doesn't leak into error messages. - Changeset present (
.changeset/scoped-schema-compliance-bundles.md). CLI flag plumbing onbin/adcp.js+bin/adcp-fuzz.jsis consistent; fix-command reproduction atcomply.ts:1004-1010preserves both--schema-rootand--hosted-stable-line-aliasand the test atstoryboard-version-negotiation.test.js:541-566verifies shell-quoted paths round-trip.
Follow-ups (non-blocking — file as issues)
statesmap never reclaims scoped entries.withExternalSchemaRootatschema-loader.ts:418-430writesstates.set(stateCacheKey(key, root), state)viaensureIniton entry but doesn't delete on exit. Long-running hosted services running compliance jobs against per-tenant temp directories (/tmp/compliance-${nonce}/...) leak one AJV instance + compiled validators + file index per unique path. Bounded by distinct schema-root strings over process lifetime, not request count, so the CLI is unaffected — but the hosted runner wants either atry/finallydeleting the entry on scope exit (cost: recompile per call) or a bounded LRU onstates.- Hosted-alias semantics let beta-only response fields pass under a stable-line badge claim. When
wireAdcpVersionis downgraded to'3.1'against a3.1.0-beta.7cache, the beta response validator accepts any beta-only fields the seller emits. Perad-tech-protocol-expert's read, the wireadcp_versionis normatively a requester pin (percore/version-envelope.json) — so emitting'3.1'while accepting beta-shape is coherent for the buyer. But for a badge asserting GA conformance, this masks spec drift. File an issue onadcontextprotocol/adcpto clarify whether the request envelope'sadcp_versionis a pin or a hint, and add astrictWireShapemode that pins to GA validators when the alias is engaged. schemaRootis operator-trusted, not tenant-trusted. The CLI flag accepts arbitrary paths and the validator walks every.jsonunder them. No content leaks via error messages (parse failures are swallowed inhasSchemaRootShape/collectSchemaRootVersionKeys) but a hosted runner that pipes tenant input straight intoComplyOptions.schemaRootlets a tenant probe filesystem reachability and trigger reads of any JSON the SDK process can see. Add a callout indocs/guides/BUILD-AN-AGENT.mdor a newdocs/development/SCHEMA-ROOT-SAFETY.mdthat operators must allow-listschemaRootat the host boundary before plumbing it through.- Changeset wording undersells the fix. The prior process-global registration was a real soundness bug for concurrent hosted runs — tenant A's
schemaRootcould shadow tenant B's validation, potentially flipping a compliance verdict. Recommend amending the changeset to mention "concurrent compliance runs could share schema bundles via process-global state" so hosted operators know to take the patch. Patch-level bump is also load-bearing here for a different reason: this PR adds new public exports (withExternalSchemaRoot,getExternalSchemaRootForCompliance,wireAdcpVersionon four interfaces,hostedStableLineAliasonResolveOptions/ComplyOptions). Strictly minor-bump material; the bugfix framing carries it but a follow-up changeset that tightens prose would be honest.
Minor nits (non-blocking)
detectSchemaVersionderives version from path basename.src/lib/conformance/schemaLoader.ts:122-128strips a trailing/bundledthen takespath.basename. If a caller passes/tmp/x/bundled,report.schemaVersionbecomesxand the fuzzer echoes it into--schema-versionforreproduceCommandatbin/adcp-fuzz.js:367. Cosmetic —options.versionwins when both are set — but the reproduction line can look like nonsense. Prefer the version-from-basename only when the basename is version-shaped.unregisterExternalSchemaRootstill exported. It's no longer used by the internal compliance/storyboard paths (they went towithExternalSchemaRoot). Test code still calls it. Keep the export for adopters who registered globals before this PR — but a deprecation note in the JSDoc pointing atwithExternalSchemaRootwould help guide them to the scoped API.
Ship it once CI validates the new schema-root tests.
| async function handleStoryboardStepCmd(args) { | ||
| const { getComplianceStoryboardById, runStoryboardStep } = await import('../dist/lib/testing/storyboard/index.js'); | ||
| const { authToken, authScheme, protocolFlag, jsonOutput, debug, positionalArgs } = parseAgentOptions(args); | ||
| const { authToken, authScheme, protocolFlag, jsonOutput, debug, positionalArgs, complianceVersion, schemaRoot } = |
There was a problem hiding this comment.
Approving. Scoping schema roots via AsyncLocalStorage is the right shape — concurrent compliance runs against different bundles stop poisoning each other, and the per-call withExternalSchemaRoot keeps the global register path available for callers that still want it.
Things I checked
withExternalSchemaRootprecedence insrc/lib/validation/schema-loader.ts:234-240— scoped > global > dist > cache. Concurrentregister-only and scoped callers can't poison each other becausegetStore()returnsundefinedoutside the ALS chain.statesMap rekeyed tobundleKey\0root(stateCacheKeyat L325) — two roots for the same bundle key now coexist;clearStatesForBundlewalks correctly whenregisterswaps a root.ensureCoreLoadednow also pre-registers async response variants withrelaxResponseRoot(L617-633). Closes thecomply_test_controller→core/async-response-data.json→ async-variant chain so request validators compile without prewarming the response side.getValidatorcorrectly triggersensureCoreLoadedfordirection === 'request'(L674).hostedStableLineAliassemantics insrc/lib/testing/storyboard/compliance.ts:657-704. Regex pair handles"3"(no alias),"3.10","v3.1.0-rc.1","3.1.0+build"(excluded byisPrereleaseVersion— correct semver call). Bridge only fires for prerelease caches + matching stable line + alias listed insupported_versions.wireAdcpVersionpropagation:SingleAgentClientConfig→TaskExecutor.config→ProtocolClient.callTool(wireAdcpVersion ?? adcpVersionatsrc/lib/protocols/index.ts:305-309). Schema validation stays pinned toadcpVersion; only the wire envelope shifts.- Changeset present,
patch. Side-effect removal fromloadComplianceIndexis bug-shaped —patchis defensible as restoring documented intent. - Test plan in PR body checks out — focused suites,
node --checkon both CLIs, prettier, commitlint, pre-push hook. Three new tests cover the hosted-alias bridge, external-root async-ref resolution, and fix-command preservation. security-reviewer: sound. No multi-tenant contamination viacompiledValidatorsorstates. CLI flags are build/test-time only; no untrusted reachability.ad-tech-protocol-expert: sound-with-caveats. Hosted stable-line bridge is defensible for the AAO 3.1 case where sellers advertise only the GA line until ratified. The witness-vs-translator surface is sharpened (beta-only invariants applied to"3.1"envelopes) but the certifier owns the cache choice.
Follow-ups (non-blocking — file as issues)
cachedAjvis process-global insrc/lib/conformance/oracle.ts:58-65.compiledValidatorsis correctly keyed per(tool, schemaRoot, version), but every compile shares the one AJV instance. TworunConformancecalls in the same process against differentschemaRoots whose bundled schemas carry the same$id(the realistic case —$ids are spec-determined, not root-determined) will throwschema with key or id "..." already existson the second compile. The new test attest/lib/conformance-integration.test.js:96writes schemas without$id, so the regression path is uncovered. Fix shape: per-(root, version)AJV instances mirroring thestatesMap pattern inschema-loader.ts, orgetAjv().removeSchema($id)before recompile.InProcessAgentClientConfigomitswireAdcpVersion(src/lib/core/AgentClient.ts:167-181). Picks'adcpVersion'and'versionEnvelope'but not the new field — in-process callers can't set the wire override. Probably moot for the hosted-certifier use case but worth closing for API symmetry.states.delete(stateCacheKey(...))runs BEFOREscopedExternalSchemaRoots.run(schema-loader.ts:428). Two concurrentwithExternalSchemaRoot(sameKey, sameRoot)calls evict each other's cachedLoaderState. Closures hold the ref so in-flight validations are safe, but each parent re-buildFileIndex's and recompiles. Move the eviction insiderun(), or skip when the existing cached state already matches the root.assertSchemaRootMatchesVersiontightened tokeys.size === 1 && keys.has(expectedKey)(L376-385). The oldkeys.size === 0 || keys.has(expectedKey)accepted mixed bundles and empty-$idroots; the new fence rejects both. Tighter is correct — a3.0schema sitting next to3.1in one root was the actual loophole. Worth a one-line note in the changeset body so adopters maintaining custom mirror layouts see it.
Minor nits (non-blocking)
- Help-text alias coverage.
bin/adcp.js:1745-1754documents--schema-rootand--validator-sourceas aliases;bin/adcp-fuzz.js:22-28documents both pairs. ThehandleStoryboardShowflag-stripping at line 2197 compares against the resolved value, not the alias spelling — works becauseparseComplianceSelectionreads either. Worth a JSDoc onparseComplianceSelectionto make the alias contract explicit. - Unbounded growth on
responseSchemaHasContext/compiledValidators. Caches keyed by(tool, root, version)with no eviction. Bounded by distinct root paths in practice — fine for short-lived processes, flag if a long-lived hosted service iterates many roots.
LGTM. Follow-ups noted below.
Summary
Fixes external compliance/schema bundle handling by scoping schema roots per run, resolving async response schema refs without response-validator prewarm, and adding hosted stable-line alias support.
It exposes schema-root/version options through compliance, storyboard, and conformance-fuzzing APIs and CLIs, while preserving stable wire versions against prerelease validator caches.
It adds regression coverage for external schema roots, hosted aliases, async response refs, fuzzer schema overrides, and fix-command reproduction flags.
Validation
Ran
npm run build:lib, focused node test suites for storyboard/schema/conformance coverage,node --checkfor both CLIs, Prettier checks on touched TypeScript files, commitlint, and the pre-push typecheck/build hook.