feat(compliance): comply_controller_mode_gate storyboard + acme-outdoor-live kit (#4028)#4384
Merged
Merged
Conversation
…-outdoor-live test kit Exercises the live-account denial path for comply_test_controller: sellers that expose the controller must return FORBIDDEN when called by a live-mode (non-sandbox) principal. Optional phase accommodates two-deployment sellers whose sandbox endpoint has no per-account gate; required for single-endpoint sellers implementing the gate. Closes #4028. https://claude.ai/code/session_01WEcfSgq5grVGRgRdbmgLNW
bokelley
added a commit
that referenced
this pull request
May 11, 2026
…s half of #4380) (#4387) Three pages updated to reflect the (Sandbox) verdict from #4379: aao-verified.mdx: - Title / description / intro / TL;DR — (Sandbox) replaces (Live) as the higher tier - "What each axis certifies" tables — (Sandbox) tested against registered production URL with account.sandbox: true; same storyboard suite as (Spec), zero real-world side effects - "How agents earn each axis" — (Sandbox) earn instructions describe the seller-side sandbox-account gate - "Reading a badge" — display rows updated - "Lifecycle" — (Sandbox) lifecycle replaces (Live), cross-mode leakage flagged as immediate-revoke trigger - "Coverage gaps" — universal storyboards run as standard suite, no observability carve-out - JWT verification_modes — ["spec"] / ["spec", "sandbox"] - Supporting specs — #3755/#4382 (schema gate), #4028/#4384 (mode-gate storyboard), #4226/#4228 (UNKNOWN grading) - Warning banner above the deprecated canonical-campaign sections (eight checks, webhook ownership, two discovery paths, maintenance windows) — full removal in follow-up sweep conformance.mdx: - Intro / "Two words, not three" — (Sandbox) replaces (Live) - "Storyboard conformance vs. AAO Verified" — both qualifiers run the same storyboards; (Sandbox) is the stronger claim because it attests prod-stack sandbox tolerance comply-test-controller.mdx: - Prominent Note callout at the top: controller is dev/staging-only, AAO grading does NOT require or use it. Connects to (Sandbox) framing via #4379. Production-stack sandbox-flag honoring is what (Sandbox) attests, not controller exposure. Push A item 4 of 4 in the compliance reporting fidelity initiative is now complete. Follow-up: remove deprecated canonical-campaign sections from aao-verified.mdx (tracked in #4380). Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Closes #4028.
universal/comply-controller-mode-gate.yaml— new storyboard that verifies sellers refusecomply_test_controllerdispatch withFORBIDDENwhen the calling account is in live mode (non-sandbox).test-kits/acme-outdoor-live.yaml— live-mode twin of the sandboxacme-outdoorkit (sandbox: false,auth.api_key: "demo-acme-outdoor-live-v1").live_account_denial,optional: true) with one step (deny_live_caller): sends a well-formedforce_creative_statusrequest authenticated as the live-mode principal and assertsFORBIDDEN.What changed
static/compliance/source/universal/comply-controller-mode-gate.yaml(new)category: deterministic_testing,track: securityrequires: [controller]gates the storyboard at load time — sellers withoutcomply_test_controllerskip cleanly (requirement_unmet)live_account_denialisoptional: true: two-deployment sellers whose sandbox endpoint has no per-account gate will fail this phase (they return success, not FORBIDDEN) but the storyboard still passes — they correctly prevent live-mode misuse by not advertising the tool on their production endpoint. Single-endpoint sellers that implement the gate MUST pass.auth: { type: api_key, from_test_kit: true }(live-mode key) andexpect_error: true/negative_path: payload_well_formed(well-formed request, runtime rejection).check: field_value path: 'error' allowed_values: ['FORBIDDEN'](ControllerError shape, not canonical error-code.json — matches thedeterministic-testing.yamlconvention for controller-surface codes).static/compliance/source/test-kits/acme-outdoor-live.yaml(new)sandbox: falsesignals live/production mode for controller gating tests.acme-outdoorkit (same entity, different mode).lint-storyboard-test-kits.cjs(auth.api_keypresent).Non-breaking justification
New universal storyboard + new test kit. No existing files modified. Per playbook: "additive scenarios, new universal storyboards are patch-eligible." The storyboard is gated by
requires: [controller]— sellers that don't exposecomply_test_controllerare unaffected.Changeset
patch— new universal storyboard, additive only.Pre-PR review
check: field_value path: 'error'is the correct assertion for ControllerError (notcheck: error_code, which targetserrors[0].codein the canonical envelope);optional: truewithoutskip_ifis permitted by schema and correct for Option A intent (advisory, not hard-enforced for two-deployment sellers).check: error_code value: FORBIDDENblocker fixed;optional: truesemantics are correct per design intent; redundancy ofsuccess+errorfield checks dissolves withfield_valuepattern (matchesdeterministic-testing.yaml).Reviewer note:
requires: [controller]is the first use of a root-levelrequiresstanza across all universal storyboards. The schema supports it and all lints pass (npm run build:compliance— 11 checks green), but worth a manual scan if the review touches that section.Milestone note: Milestone could not be confirmed via available tooling — please set to the active
3.0.xpatch target (currently3.0.12per the open Version Packages PR #4381) before merge.https://claude.ai/code/session_01WEcfSgq5grVGRgRdbmgLNW
Generated by Claude Code