Scenario: an agent calls a tool; the SDK records one intercept row containing the wire response. The LLM then claims several facts about that same row — e.g. six fields of the same support_list_tickets response.
Today: each claim creates its own query record, generates its own proof, and runs its own verify — all proving the same row. Six claims = six identical SELECT * FROM provably_intercepts WHERE id = N proofs, ~3s each.
Fix (payload_builder.py:141-162): group claims by row_id; create one query record per group. Claims without a row_id skip dedup and keep one query record each — no name-based fallback (silently collapsing different rows that share an action_name would be unsafe). evaluator.py:183-209 already keys verification by query_record_id, so deduping at record-creation time flows through unchanged.
Impact: 6 claims on one row → 1 proof instead of 6. SDK time on the turn drops from ~17s to ~3s.
Split out of #34.
Scenario: an agent calls a tool; the SDK records one intercept row containing the wire response. The LLM then claims several facts about that same row — e.g. six fields of the same
support_list_ticketsresponse.Today: each claim creates its own query record, generates its own proof, and runs its own verify — all proving the same row. Six claims = six identical
SELECT * FROM provably_intercepts WHERE id = Nproofs, ~3s each.Fix (
payload_builder.py:141-162): group claims byrow_id; create one query record per group. Claims without arow_idskip dedup and keep one query record each — no name-based fallback (silently collapsing different rows that share anaction_namewould be unsafe).evaluator.py:183-209already keys verification byquery_record_id, so deduping at record-creation time flows through unchanged.Impact: 6 claims on one row → 1 proof instead of 6. SDK time on the turn drops from ~17s to ~3s.
Split out of #34.