test: interaction-model end-to-end suite with a requirements manifest#2691
test: interaction-model end-to-end suite with a requirements manifest#2691maxisbey wants to merge 27 commits into
Conversation
New tests/interaction/ suite asserting client<->server round trips through the public API only. Tests are organised around a requirements manifest (_requirements.py) mapping each test to the spec or SDK behaviour it exercises, with known divergences from the spec recorded on the requirement; test_coverage.py enforces that every non-deferred requirement is exercised by at least one test. Covers tools, prompts, resources, and ping against the low-level Server, plus MCPServer tool-call behaviours. Removes two 'pragma: no cover' comments on the ping send/answer paths now that they are covered.
…action tests Extends the interaction suite with the initialize handshake (server identity, instructions, capability derivation, client identity and capabilities as seen by the server), completion round trips, logging notifications, and the MCPServer resource/prompt/structured-output behaviours. Records two more divergences on the requirements manifest: MCPServer reports unknown resources and prompts with error code 0 rather than the codes the spec documents. Removes the 'pragma: no cover' from the method-not-found fallback now that it is covered.
Covers the server-to-client half of the interaction model: sampling, form-mode elicitation, roots, progress in both directions, list_changed notifications, and request cancellation, all against the low-level Server through the public Client API. Records a further divergence: the server answers cancelled requests with an error response where the spec says no response should be sent. Removes five more 'pragma: no cover' comments on paths these tests now cover (server list_changed senders, the client roots send path, and the default elicitation callback).
…eta interaction tests Covers URL-mode elicitation (including the elicitation/complete lifecycle and the -32042 rejection flow), resource subscriptions and update notifications, cursor pagination across all four list methods, request and session read timeouts, _meta round trips, and the MCPServer Context convenience methods. Removes the 'pragma: no cover' from the resource-updated send path now that it is covered.
…ction tests Adds ClientSession-level tests for pre-initialization request rejection and protocol version negotiation, a proof that concurrent tool calls are dispatched simultaneously and answered independently, and tests pinning three behaviour gaps: tool-set mutations send no list_changed notification, logging/setLevel is not supported by MCPServer and no level filtering exists, and tool-enabled sampling is rejected because the high-level client cannot declare the sampling.tools capability.
A RecordingTransport wrapper tees every message crossing the client's transport boundary so the suite can assert properties that are invisible to API callers: request ids are unique and never null, notifications are never answered, and exactly one initialized notification is sent between the initialize response and the first feature request.
Drives the streamable HTTP Starlette app through httpx's ASGI transport so the full HTTP framing layer (session ids, SSE and JSON response encoding, stateful and stateless modes) runs with no sockets, threads, or subprocesses. Covers the handshake, tool calls and errors, mid-call notifications, the stateless rejection of server-initiated requests, and the routing of unrelated server messages to the standalone stream. Removes the 'pragma: no cover' comments these tests now cover (the session-manager accessors, the no-session-id validation path, and the related-request-id routing branch). The session-manager accessor's unreachable error guard keeps its pragma, moved onto the raise statement itself so the now-executed condition above it is measured normally.
…irements manifest Fixes the spec deep links that pointed at non-existent anchors, records the divergences for the client's default not-supported answers (the spec names -32601 for roots and -32602 for an undeclared elicitation mode; the default callbacks answer -32600), and adds a logging:capability requirement noting that MCPServer emits log message notifications without declaring the logging capability. Also tightens behaviour sentences and docstrings to match what the tests assert, and adds a test pinning that Context.report_progress is a silent no-op when the caller supplied no progress token, removing the pragma on that path.
…n-rejection interaction tests
… two-way test coverage
…tiated requests over streamable HTTP
… conformance tests
- raise pytest floor to 8.4.0 (RaisesGroup, used in the auth tests) - collect twice in connect_over_sse so the unclosed sse_stream_reader is finalized on 3.10 (PEP 442 cycle needs a second pass) - collapse stacked async-with statements into comma form where it reads cleanly, and apply # pragma: no branch on the remaining sync-with + async-with shapes coverage.py mis-traces on 3.11/3.14
SseServerTransport.connect_sse never closed sse_stream_reader after EventSourceResponse returned; the in-process bridge dropped its chunk reader when a request was cancelled before the response started. Close both at source so the interaction suite no longer needs a gc.collect() workaround, and pull one assert inside its async-with body to clear the last 3.11 coverage gap.
There was a problem hiding this comment.
I didn't find any bugs in this PR, but it's a very large addition (73 files, ~520 tests, a requirements-manifest framework, in-process OAuth/ASGI harnesses) plus src/ changes beyond pure pragma removal — including a behavioral change in sse.py (closing the SSE stream reader) and many narrowed/relocated coverage pragmas — so it warrants a maintainer's review of the design decisions and the pinned spec divergences.
Extended reasoning...
Overview
This PR adds an entire end-to-end interaction-model test suite under tests/interaction/ (524 tests, a requirements manifest, transport-parametrized connection factories, an in-process streaming ASGI bridge, and an in-process OAuth authorization-server harness), bumps the pytest minimum version, registers a custom requirement marker, and touches 18 src/ files. The src/ changes are predominantly removals or narrowings of # pragma: no cover / # pragma: lax no cover markers now that the new tests execute those lines, but they are not exclusively comment-only: src/mcp/server/sse.py gains an await sse_stream_reader.aclose() call in the SSE response wrapper, which is a real (if small) runtime behavior change.
Security risks
The new code is test-only, so there is no direct production security exposure from the suite itself. However, the suite pins the SDK's current OAuth and bearer-middleware behavior — including documented divergences from the spec (e.g. missing audience validation, missing scope in WWW-Authenticate challenges, lenient redirect-URI scheme registration) — as snapshot-asserted expected behavior. Encoding those divergences as passing tests is a deliberate design choice with security-adjacent implications that maintainers should explicitly endorse rather than have approved automatically.
Level of scrutiny
This is far outside the scope of an auto-approvable change. It is large, introduces a new testing framework and conventions (the manifest model, divergence lifecycle, coverage rules described in the README) that future contributors will be expected to follow, modifies coverage enforcement semantics across many src/ files, and includes one genuine library code change. Design-level decisions of this kind should be weighed in on by a human maintainer, and the PR is labeled P1, suggesting the maintainers intend to review it closely anyway.
Other factors
The bug hunting system reported no bugs, and the suite appears self-consistent (the coverage-contract test, the harness self-tests, and the documented conventions are coherent with the diff). There are no prior reviews or unresolved reviewer comments on the timeline. The deferral here is purely about size, the framework/convention decisions being introduced, and the non-trivial src/ touch surface — not about any specific defect found.
…tions deterministically Four manifest fixes from spec/SDK re-verification: - lifecycle:capability:* divergence notes used SHOULD; spec basic/lifecycle#operation has been MUST since 2025-06-18 - mcpserver:tool:naming-validation deferred reason claimed no naming check exists; Tool.from_function calls validate_and_warn_tool_name (warns, doesn't reject) - converted to a Divergence with a pinning test - client-auth:...issuer-validation divergence's second sentence is false (OAuthMetadata types the endpoints AnyHttpUrl, so scheme is validated) - resources:annotations now records that the SDK Annotations model lacks lastModified; the round-trip test sends it via model_validate so the snapshot pins the drop Twelve lowlevel tests sent notifications from inside a tool handler without related_request_id, so on the streamable-HTTP leg they routed to the standalone GET stream and the assertion relied on cross-stream ordering the suite documents as not guaranteed. Eight now pass related_request_id; four whose senders don't accept it use anyio.Event with the snapshot still proving the delivered set. The module docstrings that overstated the ordering guarantee are corrected. README §Coverage now documents the four lax-no-cover teardown markers and the sse.py aclose() fix that landed alongside this suite.
Adds
tests/interaction/: an end-to-end suite that enumerates the MCP interaction model — one test per behaviour, asserting full client↔server round trips through the public API only — backed by a requirements manifest that maps every behaviour the SDK must satisfy to the spec section that mandates it, the tests that prove it, and the gaps that remain.Motivation and Context
Two purposes:
What's covered
524 tests across 360 tracked requirements:
_meta, lifecycle and version negotiation, list-changed notifications, wire-level invariants.How Has This Been Tested?
uv run --frozen pytest tests/interaction/— fully in-process and event-driven (no sockets, threads, or sleeps, with one subprocess for stdio); runs in ~10 s. 100% line and branch coverage including the test code itself.src/changes are limited to removing# pragma: no covermarkers on lines the new tests now execute, plus a small number of narrowed markers documented in the suite README.Breaking Changes
None — tests and test-support only.
Types of changes
Checklist
tests/interaction/README.md)Additional context
Conventions, the manifest model, the divergence lifecycle, and the transport matrix are documented in
tests/interaction/README.md. Known SDK gaps surfaced while writing the suite will be filed as issues separately and linked from the manifest entries'issuefields.AI Disclaimer