fix(intercept): scoped intercept_context() — prevent ContextVar leak across async tool/LLM boundaries#11
Merged
Merged
Conversation
…ntextVar leak `set_interceptor_context` is fire-and-forget — once called it never resets the underlying ContextVars. Inside an async agent loop, the tool function and the subsequent LLM call run in the same `asyncio.Task`, so the tag set inside the tool keeps applying to LLM calls fired after the tool returns. Those LLM calls end up recorded in `provably_intercepts` with the tool's `action_name`. Two downstream symptoms in user code that calls `build_handoff_payload`: 1. `load_latest_intercept_payload(pg_url, action_name, agent_id)` does `ORDER BY created_at DESC LIMIT 1` and returns the most recent row matching the (agent_id, action_name) key — which, because of the leak, is the LLM POST that fired AFTER the tool, not the tool's own GET. The claim's `request_payload` then carries the LLM provider URL and trips the trust gate's "endpoints missing from trusted snapshot" check. 2. `get_intercept_row_id(agent_id, action_name)` returns the same wrong row id, so the Provably query record indexes the LLM completion JSON. The verbatim claim comparison in `evaluate_handoff` then always returns CAUGHT even when the agent's claim and the actual data agree. Fix: add `intercept_context()` — a contextlib context manager that calls `ContextVar.set` on enter, captures the tokens, and `reset` on exit. Same pattern as the existing `provably_self_egress()` manager. `set_interceptor_context` is kept for backward compatibility; its docstring now carries a warning recommending `intercept_context` for tool bodies. Includes a regression test that pins the leaky behavior of `set_interceptor_context` and a parallel test that proves `intercept_context` does not leak. 88/88 tests pass (was 82, +6 new).
The fire-and-forget setter was the leaky API; ship intercept_context as the single supported way to tag intercepts. Drop the deprecation shim and update the regression test to demonstrate the leak via raw ContextVar.set() instead of the removed public function. 88/88 tests still pass.
…shooting checklist Two doc-only additions surfaced by a debugging session where multiple unrelated misuses of the SDK all produced the same CAUGHT outcome: - intercept_context docstring: explicit "must be used with `with`" callout warning that a bare call is a no-op, plus a note that agent_id must match the intercept_agent_id passed to build_handoff_payload. - README `### eval`: short troubleshooting checklist for unexpected CAUGHT covering the four common causes (tool body never ran, bare context-manager call, agent_id mismatch, wrong row-id helper). No code changes. 88/88 tests still pass.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What
Adds
provably.intercept.intercept_context()— a context manager that scopes the agent_id / action_name / intercept_index tag to awithblock and resets the underlying ContextVars on exit.Removes
set_interceptor_context()entirely. The fire-and-forget setter was the leaky API — there's no useful callsite for it that wouldn't be better served byintercept_context. No backward-compat shim. Callers must migrate to the new context manager (one-line change).Why — the bug being fixed
set_interceptor_context()wrote to aContextVarand never reset it. Inside an async agent loop the tool function and the subsequent LLM turn run in the sameasyncio.Task, so a tag set inside the tool keeps applying to every HTTP call that fires after the tool returns. Those LLM calls get persisted intoprovably_interceptscarrying the tool'saction_name.Two downstream failure modes in user code that calls
build_handoff_payload:load_latest_intercept_payload(pg_url, action_name, agent_id)runsORDER BY created_at DESC LIMIT 1against(agent_id, action_name)request_payload[\"url\"]ends up being the LLM provider URL, which trips the trust gate's "endpoints missing from trusted snapshot" check.get_intercept_row_id(agent_id, action_name)looks up the in-memory_action_row_idsdict, which was overwritten with the LLM row's idevaluate_handoffthen returnsCAUGHTno matter what the agent claims.Both symptoms have one root cause; this PR fixes them both.
Migration
If you previously had:
Replace with:
That's the entire migration. There's no reason to call the setter outside a scope — the leak made the unscoped pattern unsafe in async contexts and offered no advantages in sync contexts.
Tests
tests/unit/test_intercept_context.py— 6 new tests, all green:intercept_contextsets values inside the block ✅test_naked_ctx_var_set_leaks_into_subsequent_calls: simulates an LLM → tool → LLM sequence in oneasyncio.Taskusing a rawContextVar.set()(which is whatset_interceptor_contextused to do internally), asserts the leak still happens with that pattern. Documents why the context manager is necessary and pins the asyncio behavior we're working around. ✅test_intercept_context_does_not_leak_into_subsequent_calls: same scenario withintercept_context, asserts the second LLM call goes back to(\"unknown\", \"unknown\"). ✅Full suite: 88/88 pass (was 82, +6 new). Ruff clean.
Out of scope (deliberately separate)
set_interceptor_context— those live on a different branch and will be migrated in their respective PRs (this PR only changes the SDK surface; downstream callers migrate independently).Suggested reviewer hops
set_interceptor_contextremoved,intercept_contextadded.test_naked_ctx_var_set_leaks_into_subsequent_calls) is the most important one to read; it documents the exact failure mode being prevented.