Skip to content

Commit 16a56cc

Browse files
bokelleyclaude
andcommitted
docs(proposals): pre-register falsification signals (Step 0.7)
For each of the six learning questions (Q1, Q1.5, Q2, Q3, Q4, Q5), write down the specific finding that would falsify the prior — before the experiment runs. This is the enforcement mechanism for self-review's "one author wearing three hats" warning: if I don't commit upfront to what would tell me I'm wrong, I'll find what I'm looking for. Concrete, observable falsifiers per question: - Q1: glue >303 LOC, monkey-patching needed, identity impedance - Q1.5: recipe schema requires proposal_id, variant Products need forged rows, hash-dedup state crosses sessions - Q2: any extra: dict, type: ignore on recipe construction, lossy round-trip - Q3: none of the three hydration models work, OR adopter-owned hydration turns out to be the right primitive - Q4: salesagent pattern is N=1; experiment informs but doesn't settle the Protocol seam - Q5: SDK→SDK signing parity already partially falsified; remaining falsifiers are auto-emit doesn't fire, retry doesn't behave Step 0.7 marked ✅ in workstream. The experiment is now fully unblocked from a planning standpoint. Phase 1 can start; Phase 2 prereqs are mechanical (pin SHAs, local fork patch for two schedulers). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
1 parent 23083ed commit 16a56cc

1 file changed

Lines changed: 184 additions & 3 deletions

File tree

docs/proposals/salesagent-sidecar-experiment.md

Lines changed: 184 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -624,6 +624,187 @@ Phase 1 is the cheapest place to falsify this. If the wire shape
624624
can't carry signal-driven variants without escape hatches, #502's
625625
recipe model needs revision.
626626

627+
## Pre-registered falsification signals
628+
629+
Self-review's "one author wearing three hats" warning applies — if I
630+
don't commit upfront to what would tell me each prior is wrong, I'll
631+
find what I'm looking for. For each learning question, the specific
632+
finding that would falsify the prior is named here, before the
633+
experiment runs. **A finding that contradicts any of these is a
634+
positive result — it's what the experiment is for.**
635+
636+
### Q1 — Does `dynamic_products.py` factor onto `ProposalManager.get_products`?
637+
638+
Prior: salesagent's signal-driven assembly fits the
639+
`ProposalManager.get_products` shape via a thin wrapping that calls
640+
into the existing 505-LOC body without re-implementing it.
641+
642+
Falsified if any of:
643+
644+
* **LOC budget exceeded.** Glue exceeds 60% of source body
645+
(>303 LOC against 505). Hard threshold; pre-registered.
646+
* **Wrap-as-port.** The wrapper has to re-execute logic from inside
647+
`dynamic_products.py` rather than calling it as-is — e.g.,
648+
re-running `signals_agent_registry` lookup, rebuilding variant
649+
products from intermediate state, or duplicating the de-dup hash
650+
logic.
651+
* **Monkey-patching required.** The wrapper has to inject into
652+
`dynamic_products` module-level state, replace function references,
653+
or modify globals to make it work in a `ProposalManager` shape. If
654+
this happens, the abstraction is a leaky shim, not a clean factor.
655+
* **Identity-shaped impedance.** `dynamic_products.py` requires
656+
`ResolvedIdentity` shaped exactly the way salesagent's MCP wrapper
657+
builds it; the SDK's projection from `BuyerAgent` + `Account` to
658+
the equivalent loses information the assembly logic depends on.
659+
660+
If any falsifier fires: #502's claim that proposal-side assembly is
661+
a clean wrap-of-`_impl` shape is wrong. Adopters with non-trivial
662+
proposal logic would have to choose between (a) restructuring their
663+
assembly to fit the SDK shape, or (b) sticking with their existing
664+
runtime. Either is a real finding that revises #502.
665+
666+
### Q1.5 — Does the recipe model allow proposal-time *assembly*?
667+
668+
Prior: #502's "framework session cache against `proposal_id`" model
669+
accommodates dynamic products. Salesagent generates signal-driven
670+
variant `Product` rows at brief time; the SDK's session-cache
671+
abstraction can carry these.
672+
673+
Falsified if any of:
674+
675+
* **Recipe schema requires `proposal_id` lookup.** Signal-driven
676+
variants generated at brief time have no committed `proposal_id`
677+
yet; if the recipe schema requires one to validate or hydrate,
678+
the model is too late-bound.
679+
* **Variant Products require new schema rows.** Salesagent's
680+
dynamic products land as new `Product` rows with TTL
681+
(`expires_at`); the SDK's session-cache model assumes recipes
682+
are looked up against pre-existing Products, not assembled
683+
alongside them. If we have to forge `Product` rows the framework
684+
doesn't know about to make this work, the abstraction is wrong
685+
— recipes must support proposal-time *assembly*, not just lookup.
686+
* **Hash-dedup state crosses sessions.** `dynamic_products.py`
687+
hashes inputs to dedup variants; if the hash state can't fit
688+
the framework's session-scoped cache (because dedup is global
689+
cross-session), the session-scoped model is wrong.
690+
691+
If any falsifier fires: #502 needs a revision adding proposal-time
692+
recipe assembly as a first-class concern. The session-cache model
693+
becomes one shape among multiple.
694+
695+
### Q2 — Does the recipe carry enough?
696+
697+
Prior: GAM's `implementation_config` (the most-evolved recipe shape
698+
in salesagent) fits a typed Pydantic recipe without escape hatches.
699+
700+
Falsified if any of:
701+
702+
* **`extra: dict[str, Any]` field on the recipe.** Any typed escape
703+
hatch — including `vendor_specific: dict`, `__pydantic_extra__`
704+
carrying GAM data, or `Annotated[Any, Field(extra=True)]` — is
705+
a tell that the typed recipe doesn't actually carry GAM's full
706+
shape.
707+
* **`# type: ignore` to make recipe construction work.** If we
708+
have to bypass mypy to build the recipe from salesagent's
709+
`Product.implementation_config` JSON, the typed shape isn't
710+
capturing what's there.
711+
* **Lossy projection.** Round-trip from
712+
`Product.implementation_config: JSONType` (salesagent) →
713+
`GAMRecipe` (typed) → `dict` (passed to `_create_media_buy_impl`)
714+
loses any field. A literal dict comparison after round-trip
715+
must be equal.
716+
717+
If any falsifier fires: #502's typed-recipe model is wrong, or
718+
incomplete, or needs an escape-hatch design (`unstructured: dict`
719+
field with documented semantics, like Kubernetes annotations).
720+
Worth surfacing in a Protocol RFC.
721+
722+
### Q3 — What hydration model does `create_media_buy` need?
723+
724+
Prior: framework hydrates the recipe at `create_media_buy` time
725+
from one of three sources (session cache, persisted DB row, fresh
726+
lookup); the experiment forces a choice.
727+
728+
Falsified if:
729+
730+
* **None of the three work.** Hydration requires re-running the
731+
proposal-side assembly logic at `create_media_buy` time
732+
(because assembly depends on signal-time-of-day, signal agent
733+
state at brief moment, or other non-idempotent inputs).
734+
* **Framework-owned hydration is the wrong primitive.** The right
735+
answer is "framework owns no hydration; adopter handles it
736+
inside `_create_media_buy_impl`" — meaning the SDK's framework
737+
abstraction is incorrectly drawn.
738+
739+
If any falsifier fires: #502's framework-managed-recipe-state
740+
model is wrong. The recipe is adopter-owned data the SDK doesn't
741+
need to mediate; the SDK's job is just to type the contract.
742+
743+
### Q4 — What is the right shape for the HITL resumption marker?
744+
745+
Prior: the experiment can answer "does the SDK seam accommodate
746+
salesagent's setattr-sentinel pattern" with the SDK as it ships
747+
today.
748+
749+
**Step 0 partially answered this:** the setattr pattern works as-is
750+
(`compose_method` passes `req` through unchanged; setattr on a
751+
Pydantic model with `extra='forbid'` survives Python-level
752+
dispatch). So the prior holds for this experiment.
753+
754+
The deeper question — "what is the right Protocol seam for
755+
resumption markers across multiple adopters?" — is **N=1 from
756+
this experiment**. Falsifiers for the broader claim:
757+
758+
* **Salesagent's pattern doesn't map cleanly to a paused-coroutine
759+
shape** another adopter might use. If a future adopter with
760+
TaskRegistry-style resumption can't reuse the experiment's
761+
marker shape, the typed seam needs to be different.
762+
* **The setattr survives only because no transport boundary
763+
intervenes.** If the experiment's SDK runtime ever needs to
764+
re-validate, re-project, or serialize the request between gate
765+
and inner, the sentinel dies. (This isn't true today — verified
766+
in Step 0.5 — but it's a fragile invariant.)
767+
768+
If any falsifier fires: the Protocol RFC should propose a typed
769+
`ctx.resumption_token: ResumptionToken | None` that's robust to
770+
re-projection. **The experiment can't choose between shapes; it
771+
just shows the untyped pattern works for one adopter.**
772+
773+
### Q5 — Does F12 webhook auto-emit hold up under real load?
774+
775+
Prior, original: `WebhookSender` configured on `serve(...)` fires
776+
sync-completion webhooks automatically, signed correctly, retried
777+
on transient failure, logged-and-swallowed on permanent failure —
778+
without adapter code participating. §3.14's claim that adopters
779+
delete their webhook plumbing wholesale.
780+
781+
**Step 0.6 already partially falsified this.** Salesagent's
782+
`X-Webhook-Signature` scheme and SDK's `X-AdCP-Signature` scheme
783+
are incompatible. §3.14 needs a correction. So the prior is
784+
already known wrong — the question now is which of three cutover
785+
paths the experiment recommends:
786+
787+
(a) Buyers migrate to SDK signing.
788+
(b) SDK ships a salesagent-compatible signing mode alongside
789+
`from_adcp_legacy_hmac`.
790+
(c) Side-car preserves salesagent's `webhook_authenticator.py`
791+
rather than using F12 auto-emit.
792+
793+
Falsifiers for the SDK→SDK signing path (the only one the
794+
experiment validates):
795+
796+
* **`WebhookSender``WebhookReceiver` round-trip fails** with
797+
matching secrets (extremely unlikely — well-tested in
798+
conformance suite, but worth running once on day 1).
799+
* **Auto-emit doesn't fire** after a successful mutating tool
800+
call (means F12 framework wiring is broken or our `serve(...)`
801+
config is wrong).
802+
* **Retry / failure-swallow doesn't behave per spec** — would
803+
require buyer-side observation of retried deliveries.
804+
805+
If any falsifier fires: F12 isn't ready as the default path even
806+
for SDK→SDK signing.
807+
627808
## Risks (revised)
628809

629810
* **Wrap target drift.** Mitigated by Step 0 `_impl` identification
@@ -716,9 +897,9 @@ section above. Remaining items are concrete prereqs.
716897
the experiment, use SDK→SDK signing only (test buyer is
717898
`adcp.WebhookReceiver` with the same secret). Production
718899
cutover requires buyer migration as separate work.
719-
0.7. Pre-register the candidate contradictions for each of the five
720-
learning questions (which finding would tell us each prior is
721-
wrong).
900+
0.7. ✅ Falsification signals pre-registered for each of the five
901+
(six, with Q1.5) learning questions. See "Pre-registered
902+
falsification signals" section above.
722903

723904
**Phase 1 — `dynamic_products.py` recipe falsification (~1 day).**
724905

0 commit comments

Comments
 (0)