The SDK was built for two-cluster handoff (A records HTTP, B verifies via proofs). Real usage today — customer-support-sdk-demo, anything Claude-Code-shaped — is one agent doing everything, and that flow pays multi-agent ceremony on every tool call. This proposal removes the ceremony and speeds up multi-agent at the same time. Same code path, no mode flag, no detection logic.
3 tool calls, 3 claims (two of them hit the same intercept row):
NOW — every backend round-trip is sequential
agent ─► tool 1 ──► [ preprocess poll ]
agent ─► tool 2 ──► [ preprocess poll ]
agent ─► tool 3 ──► [ preprocess poll ]
build ─► QR claim 1
build ─► QR claim 2 (duplicates QR1 — same intercept row)
build ─► QR claim 3
eval ─► fetch claim 1
eval ─► fetch claim 2
eval ─► fetch claim 3
eval ─► verify QR1
eval ─► verify QR2
eval ─► verify QR3
──────────────────────────────────────────────────────────► time
PROPOSED — preprocess coalesced, claims deduped, the rest parallel
agent ─► tool 1, tool 2, tool 3 (no preprocess wait)
worker ─► [ 1 preprocess, debounced ]
build ─► [ QR1, QR2 in parallel ] (3 claims → 2 records)
eval ─► [ fetch1, fetch2, fetch3 in parallel ]
eval ─► [ verify1, verify2 in parallel ]
──────────────────────────────────────────────────────────► time
For K claims over M unique intercept rows:
| Stage |
Now |
Proposed |
| Tool-call latency |
K × preprocess wait |
~0 (worker is async) |
| Preprocess runs |
K (one per call) |
1 (debounced) |
| Query records made |
K (one per claim) |
M (deduped by row) |
| Evaluator fetches |
K sequential |
K parallel (≤8 at once) |
| Evaluator verifies |
M sequential |
M parallel (≤8 at once) |
K=10, M=5 → 10 preprocesses become 1, 10 query records become 5, 20 sequential evaluator round-trips become 2 parallel batches.
What's slow today
Walking through the same 3-tool-call, 3-claim run:
- Each tool call blocks while preprocess runs end-to-end.
_storage.py:170 synchronously kicks preprocess after every intercept and polls until done. 3 calls = 3 sequential preprocess runs blocking the agent.
- Each claim's query record is created serially.
payload_builder.py:141-162 loops claims doing POST /query → POST /generate_proof → poll, one at a time. K claims = K sequential round-trips, even when claims duplicate.
- The evaluator does the same.
evaluator.py:113-133 fetches each claim's record serially; :183-209 verifies each unique query_record_id serially.
set_interceptor_context is mandatory and easy to get wrong. Interceptor default "unknown" (interceptor.py:238) doesn't match payload-builder default "fetch_and_claim" (payload_builder.py:30) — forget the wrap and the lookup silently misses with an empty payload.
- Bootstrap always runs preprocess.
client.py:50 runs it even on padding-only tables.
- Polling floors are too high.
_preprocess.py:102/117 poll every 0.3s / 0.1s — most preprocesses finish faster than the floor.
The fix
One worker thread coalesces preprocess. Replace the synchronous per-intercept call in _storage.py:170 with a "dirty" flag picked up by a debounced background worker. Worker runs preprocess once per 50 ms window. _build_claims and evaluate_handoff block on a condition variable until the proof catches up to their snapshot (SELECT MAX(id) FROM provably_intercepts).
This is the whole single-vs-multi-agent story in one mechanism:
- 1 agent, 10 sequential calls → 1 preprocess (was 10)
- N agents interleaving → still 1 worker; each agent's evaluate waits for its own snapshot
Dedupe per-intercept query records. Today payload_builder.py:141-162 creates a query record per claim, even when several claims target the same intercept row. Group claims by SQL signature (row_id when the interceptor recorded one, else the fallback WHERE action_name = '...' at _query_records.py:83-88) before creating; share the resulting query_record_id across the group. K claims with M unique signatures → M query records instead of K. evaluator.py:183-209 already dedupes by query_record_id, so this falls out for free downstream.
Parallelize the per-claim loops. Three places, all bounded ThreadPoolExecutor(max_workers=8):
- Query-record creation (
payload_builder.py:141) — over the deduped set
- Evaluator fetch (
evaluator.py:113)
- Evaluator verify (
evaluator.py:183)
Make set_interceptor_context optional. Align interceptor + payload-builder defaults to "_default". Single-agent users skip the wrap entirely. Multi-agent users keep labeling agents the way they always have — no behavior change for them.
Two small wins. Skip startup preprocess on padding-only tables (client.py:50). Drop polling floors to 0.05s with exponential ramp.
One sugar. provably.verify(claims) — a one-call wrapper around build_handoff_payload + evaluate_handoff. Old two-step API stays.
Code: now vs proposed for a single-agent user
# Now
provably.configure_indexing(enable_indexing=True)
set_interceptor_context(agent_id="demo", action_name="get_weather") # mandatory
requests.get(...)
payload = provably.build_handoff_payload(claims)
verdict = provably.evaluate_handoff(
payload, provably_base_url=..., postgres_url=..., org_id_fallback=...,
)
# Proposed
provably.configure_indexing(enable_indexing=True)
requests.get(...) # no wrap
verdict = provably.verify(claims)
Files touched
_preprocess.py — worker thread, cond-var sync, adaptive polling
_storage.py:170 — mark_dirty() instead of sync preprocess
payload_builder.py — snapshot fence; dedupe claims by intercept row; parallel query-record creation; default intercept_agent_id="_default"
evaluator.py — parallel Phase 1+2 fetch and Phase 3 verify
interceptor.py:238 — default agent_id "_default"
client.py:50 — skip bootstrap preprocess on padding-only table
__init__.py — export verify()
No deletions. No breaking imports. No new required public surface.
How we verify
pytest tests/unit/, tests/e2e/test_interceptor_e2e.py, tests/e2e/test_post_handoff_e2e.py pass unchanged
time python examples/openai_agents/agent_run.py before vs after
- Run customer-support-sdk-demo end-to-end —
evaluate should drop substantially on multi-claim runs with no code changes
- New concurrency test: two threads insert intercepts while a third calls
_build_claims; verify the claims reflect the highest committed id
Open for discussion
- Is the 50 ms debounce the right default, or should it be tunable?
- Is
max_workers=8 safe against Rust BE rate limits?
- Should
verify() accept the same kwargs as evaluate_handoff (timeout, etc.) or stay minimal?
- Worker thread lifecycle: when does it start (first
mark_dirty()? import-time?) and how does it stop (atexit? explicit shutdown()?). Needs to be nailed down in the PR.
- Polling floor of 0.05s: chosen without measuring the actual preprocess-completion distribution from the Rust BE. If most preprocesses finish in 80–150 ms, 0.05s costs ~3× more polls than 0.3s with little payoff. Worth benchmarking before locking in.
The SDK was built for two-cluster handoff (A records HTTP, B verifies via proofs). Real usage today — customer-support-sdk-demo, anything Claude-Code-shaped — is one agent doing everything, and that flow pays multi-agent ceremony on every tool call. This proposal removes the ceremony and speeds up multi-agent at the same time. Same code path, no mode flag, no detection logic.
3 tool calls, 3 claims (two of them hit the same intercept row):
For K claims over M unique intercept rows:
K=10, M=5 → 10 preprocesses become 1, 10 query records become 5, 20 sequential evaluator round-trips become 2 parallel batches.
What's slow today
Walking through the same 3-tool-call, 3-claim run:
_storage.py:170synchronously kicks preprocess after every intercept and polls until done. 3 calls = 3 sequential preprocess runs blocking the agent.payload_builder.py:141-162loops claims doing POST /query → POST /generate_proof → poll, one at a time. K claims = K sequential round-trips, even when claims duplicate.evaluator.py:113-133fetches each claim's record serially;:183-209verifies each uniquequery_record_idserially.set_interceptor_contextis mandatory and easy to get wrong. Interceptor default"unknown"(interceptor.py:238) doesn't match payload-builder default"fetch_and_claim"(payload_builder.py:30) — forget the wrap and the lookup silently misses with an empty payload.client.py:50runs it even on padding-only tables._preprocess.py:102/117poll every 0.3s / 0.1s — most preprocesses finish faster than the floor.The fix
One worker thread coalesces preprocess. Replace the synchronous per-intercept call in
_storage.py:170with a "dirty" flag picked up by a debounced background worker. Worker runs preprocess once per 50 ms window._build_claimsandevaluate_handoffblock on a condition variable until the proof catches up to their snapshot (SELECT MAX(id) FROM provably_intercepts).This is the whole single-vs-multi-agent story in one mechanism:
Dedupe per-intercept query records. Today
payload_builder.py:141-162creates a query record per claim, even when several claims target the same intercept row. Group claims by SQL signature (row_idwhen the interceptor recorded one, else the fallbackWHERE action_name = '...'at_query_records.py:83-88) before creating; share the resultingquery_record_idacross the group. K claims with M unique signatures → M query records instead of K.evaluator.py:183-209already dedupes byquery_record_id, so this falls out for free downstream.Parallelize the per-claim loops. Three places, all bounded
ThreadPoolExecutor(max_workers=8):payload_builder.py:141) — over the deduped setevaluator.py:113)evaluator.py:183)Make
set_interceptor_contextoptional. Align interceptor + payload-builder defaults to"_default". Single-agent users skip the wrap entirely. Multi-agent users keep labeling agents the way they always have — no behavior change for them.Two small wins. Skip startup preprocess on padding-only tables (
client.py:50). Drop polling floors to 0.05s with exponential ramp.One sugar.
provably.verify(claims)— a one-call wrapper aroundbuild_handoff_payload+evaluate_handoff. Old two-step API stays.Code: now vs proposed for a single-agent user
Files touched
_preprocess.py— worker thread, cond-var sync, adaptive polling_storage.py:170—mark_dirty()instead of sync preprocesspayload_builder.py— snapshot fence; dedupe claims by intercept row; parallel query-record creation; defaultintercept_agent_id="_default"evaluator.py— parallel Phase 1+2 fetch and Phase 3 verifyinterceptor.py:238— default agent_id"_default"client.py:50— skip bootstrap preprocess on padding-only table__init__.py— exportverify()No deletions. No breaking imports. No new required public surface.
How we verify
pytest tests/unit/,tests/e2e/test_interceptor_e2e.py,tests/e2e/test_post_handoff_e2e.pypass unchangedtime python examples/openai_agents/agent_run.pybefore vs afterevaluateshould drop substantially on multi-claim runs with no code changes_build_claims; verify the claims reflect the highest committedidOpen for discussion
max_workers=8safe against Rust BE rate limits?verify()accept the same kwargs asevaluate_handoff(timeout, etc.) or stay minimal?mark_dirty()? import-time?) and how does it stop (atexit? explicitshutdown()?). Needs to be nailed down in the PR.