Skip to content

Add richer OTel MCP span attributes#2387

Open
Kludex wants to merge 5 commits intomainfrom
codex/add-otel-mcp-attributes
Open

Add richer OTel MCP span attributes#2387
Kludex wants to merge 5 commits intomainfrom
codex/add-otel-mcp-attributes

Conversation

@Kludex
Copy link
Copy Markdown
Member

@Kludex Kludex commented Apr 2, 2026

Summary

  • add shared OTel attribute builders for MCP client and server request spans
  • attach rpc.system, rpc.method, rpc.service, mcp.resource.uri, and mcp.session.id where available
  • cover the new resource and streamable HTTP tracing attributes with focused tests

Testing

  • uv run pytest tests/shared/test_otel.py -q
  • uv run pytest tests/shared/test_streamable_http.py -q -k "resource_read_spans_include_session_id or streamable_http_client_resource_read"

Notes

  • this keeps the existing generic MCP send/handle span model and enriches it with per-request metadata
  • mcp.session.id is surfaced on the client side through the streamable HTTP write stream wrapper, and on the server side from HTTP request metadata

Copy link
Copy Markdown
Contributor

@claude claude bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM — clean, well-tested telemetry enrichment; the duplicate constant nit is already noted inline.

Extended reasoning...

Overview

This PR enriches MCP OpenTelemetry spans with standardized RPC attributes (rpc.system, rpc.method, rpc.service, mcp.resource.uri, mcp.session.id). Changes span six files: a new shared _otel.py helper, updates to session.py and server/lowlevel/server.py to use those helpers, a thin _SessionAwareWriteStream wrapper in the client transport, and two test files covering the new attributes.

Security risks

None. This is purely observability plumbing — no auth, crypto, or permission logic is touched. The session ID is read from existing HTTP request headers and placed into OTel span attributes, which is standard practice.

Level of scrutiny

Low-to-medium. The changes are additive and do not alter any existing request-handling, error-handling, or state-management logic. The refactor in _handle_request (moving metadata extraction before the span context manager) is a straightforward reordering with no behavioral difference. The _SessionAwareWriteStream wrapper correctly delegates all protocol methods to the inner stream.

Other factors

Tests cover the new rpc.* and mcp.resource.uri attributes for both in-process and streamable-HTTP transports. The one nit — a duplicate MCP_SESSION_ID_HEADER constant in server/lowlevel/server.py when the identical value already exists in streamable_http.py — is a minor maintenance concern flagged as an inline comment, not a functional issue.

Copy link
Copy Markdown
Contributor

@claude claude bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.


Additional findings (outside current diff — PR may have been updated during review):

  • 🟡 src/mcp/shared/_otel.py:58-84 — For completion/complete requests where the completion references a resource template, mcp.resource.uri is never set because the URI is nested at params.ref.uri, not params.uri. Both build_server_span_attributes (using getattr(params, 'uri', None)) and build_client_span_attributes (using params.get('uri')) silently return None for CompleteRequestParams, leaving resource completion spans without URI enrichment. The fix is to add a fallback: on the server side, getattr(getattr(params, 'ref', None), 'uri', None); on the client side, check params.get('ref', {}).get('uri') when the top-level uri key is absent.

    Extended reasoning...

    What the bug is and how it manifests

    The PR adds mcp.resource.uri enrichment to OTel spans for MCP requests that deal with resources. This works correctly for resources/read, resources/subscribe, and resources/unsubscribe because those params types (ReadResourceRequestParams, SubscribeRequestParams, UnsubscribeRequestParams) all expose a top-level uri field. However, the completion/complete method also references a resource URI when the completion target is a resource template — and that case is silently missed.

    The specific code path that triggers it

    In build_server_span_attributes (src/mcp/shared/_otel.py, line 80):

    resource_uri = getattr(params, 'uri', None)

    For a completion/complete request, params is a CompleteRequestParams object. Its ref field is typed ResourceTemplateReference | PromptReference. When the caller is completing a resource URI template, ref is a ResourceTemplateReference whose uri: str field holds the template URI. CompleteRequestParams has no top-level uri attribute, so getattr(params, 'uri', None) returns None and mcp.resource.uri is never added to the span attributes.

    On the client side (src/mcp/shared/_otel.py, line 55), build_client_span_attributes receives the serialized params dict and does:

    if params is not None and (resource_uri := params.get('uri')) is not None:

    The serialized CompleteRequestParams dict contains {'ref': {'uri': '...', 'type': 'ref/resource'}, 'argument': {...}}. There is no top-level 'uri' key, so params.get('uri') also returns None.

    Why existing code doesn't prevent it

    Python's getattr and dict.get silently return None for missing attributes/keys. There are no type guards checking the method name before the URI lookup, no test for completion/complete spans, and the PR description frames this as "where available" — giving the impression the omission is intentional, when in fact the URI is available, just one level deeper.

    What the impact would be

    Any observability pipeline that relies on mcp.resource.uri to understand which resource template a completion request is targeting will find that attribute absent for all completion/complete spans that reference resource templates. This makes it impossible to correlate completion activity with resource usage via spans alone.

    How to fix it

    Server side — add a fallback after the existing lookup:

    resource_uri = getattr(params, 'uri', None)
    if resource_uri is None:
        ref = getattr(params, 'ref', None)
        resource_uri = getattr(ref, 'uri', None)

    Client side — fall back to the nested ref.uri in the serialized dict:

    resource_uri = params.get('uri')
    if resource_uri is None:
        ref = params.get('ref') or {}
        resource_uri = ref.get('uri')

    Step-by-step proof

    1. A client calls client.complete(ResourceTemplateReference(uri='file:///src/{name}.py'), ...).
    2. send_request serializes params to {'ref': {'type': 'ref/resource', 'uri': 'file:///src/{name}.py'}, 'argument': {...}}.
    3. build_client_span_attributes receives this dict as params and evaluates params.get('uri')None. mcp.resource.uri is not set on the client span.
    4. The server deserializes the message into a CompleteRequestParams object where .ref is a ResourceTemplateReference(uri='file:///src/{name}.py') and there is no .uri attribute on the top-level object.
    5. build_server_span_attributes calls getattr(params, 'uri', None)None. mcp.resource.uri is not set on the server span either.
    6. Both spans are emitted with no mcp.resource.uri attribute despite the URI being present in the request.
  • 🟡 src/mcp/shared/_otel.py:38-55 — Client spans produced by build_client_span_attributes omit rpc.service, while the server-side counterpart always includes it; per OpenTelemetry RPC semantic conventions rpc.service is Recommended on client spans to identify the remote service. Without it, client spans cannot be filtered or grouped by target service in observability tooling. The fix is straightforward: add an optional service_name: str | None = None parameter to build_client_span_attributes and emit rpc.service when provided; ClientSession (which has access to initialize_result.server_info.name post-handshake) can then pass the value in.

    Extended reasoning...

    What the bug is and how it manifests

    build_client_span_attributes in src/mcp/shared/_otel.py (lines 38–55) builds the attribute dict for every client request span. It sets rpc.system, rpc.method, mcp.method.name, and jsonrpc.request.id, but has no service_name parameter and therefore never emits rpc.service. In contrast, build_server_span_attributes (lines 57–84) always includes "rpc.service": service_name. The OTel RPC semantic conventions (https://opentelemetry.io/docs/specs/semconv/rpc/rpc-spans/) mark rpc.service as Recommended for both client and server spans because it identifies the remote service being called.

    The specific code path that triggers it

    BaseSession.send_request in src/mcp/shared/session.py (lines 276–282) calls build_client_span_attributes(method=..., request_id=..., params=...). There is no service_name argument available at that layer because BaseSession is transport-agnostic and does not hold server identity state. As a result, every MCP client span is emitted without rpc.service.

    Why existing code doesn't prevent it

    The refuting verifier correctly identifies the architectural constraint: BaseSession has no access to server_info.name, which only becomes available in ClientSession._initialize_result.server_info.name after the initialize handshake. This is a real constraint, but it does not make the omission intentional in a final-design sense — it is an addressable gap. An optional service_name: str | None = None parameter added to build_client_span_attributes would be fully backward-compatible; BaseSession continues to pass nothing, while ClientSession can override send_request or supply the value through another mechanism after initialization.

    Impact

    The functional behavior of the SDK is unaffected — spans are created and trace context propagates correctly. The gap is purely in observability quality: client spans cannot be filtered, grouped, or aggregated by remote service name in any APM or log-analytics tool that relies on rpc.service. Given that the PR's stated purpose is to add richer OTel attributes, this asymmetry is a meaningful gap relative to its own goals.

    How to fix it

    1. Add service_name: str | None = None to build_client_span_attributes.
    2. Inside the function, emit attributes["rpc.service"] = service_name when service_name is not None.
    3. In ClientSession, after initialize completes, store server_info.name and pass it when calling build_client_span_attributes (e.g., by overriding send_request or injecting it via a stored instance variable).

    Step-by-step proof

    1. Open src/mcp/shared/_otel.py: build_client_span_attributes (line 38) has signature (*, method, request_id, params) — no service_name.
    2. Open src/mcp/shared/session.py, line 279: build_client_span_attributes(method=request.method, request_id=request_id, params=request_data.get("params"))rpc.service is therefore absent from every emitted client span.
    3. Open src/mcp/shared/_otel.py, line 63: build_server_span_attributes requires service_name and always sets "rpc.service": service_name.
    4. Check tests/shared/test_otel.py (lines 40–47): the test asserts client_span["attributes"]["rpc.system"] == "mcp" and rpc.method, but has no assertion for client_span["attributes"]["rpc.service"], confirming the attribute is absent and untested on the client side.

Comment on lines +40 to +57
def build_client_span_attributes(
*,
method: str,
request_id: str | int,
params: dict[str, Any] | None = None,
) -> dict[str, Any]:
"""Build OTel attributes for an MCP client request span."""
attributes: dict[str, Any] = {
"rpc.system": MCP_RPC_SYSTEM,
"rpc.method": method,
"mcp.method.name": method,
"jsonrpc.request.id": request_id,
}

resource_uri = None
if params is not None:
resource_uri = params.get("uri")
if resource_uri is None:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 build_client_span_attributes has no session_id parameter, so client spans can never carry the mcp.session.id attribute while server spans for the same session can — creating an observability asymmetry. The PR description's claim that 'mcp.session.id is surfaced on the client side through the streamable HTTP write stream wrapper' is factually incorrect: commit 67ca7fe in this same branch explicitly removed that wrapper without adding a replacement. To fix this, add a session_id parameter to build_client_span_attributes and thread the session ID from ClientStreamableHTTPTransport.session_id through BaseSession.send_request.

Extended reasoning...

What the bug is and how it manifests

build_client_span_attributes (src/mcp/shared/_otel.py lines 40-57) accepts only method, request_id, and params — there is no session_id parameter. This means that no matter what the calling code does, client OTel spans can never carry the mcp.session.id attribute. By contrast, build_server_span_attributes (lines 60-84) accepts an explicit session_id parameter and conditionally sets attributes["mcp.session.id"] when it is non-None. The two functions are structurally asymmetric for session ID support.

The specific code path that triggers it

The streamable HTTP client transport (src/mcp/client/streamable_http.py, ClientStreamableHTTPTransport) maintains self.session_id and includes it in every outgoing request via the mcp-session-id HTTP header. However, this value is never passed to BaseSession.send_request (src/mcp/shared/session.py:276), which calls build_client_span_attributes with only method, request_id, and the request params dict. There is no pathway to supply a session_id to the client span builder.

Why existing code does not prevent it

Commit 67ca7fe ('Remove client OTel session-id wrapper') in this same PR branch deliberately removed the _SessionAwareWriteStream wrapper class and the _get_transport_session_id() mechanism that previously solved this problem. Commit 2c4b51a ('Restore OTel span attribute helpers') then re-added build_client_span_attributes without the session_id parameter. The TODO comment left by the author in client/streamable_http.py acknowledges the removal was intentional ('I have dropped the get_session_id callback because it breaks the Transport protocol') but offers no replacement. The PR description was not updated to reflect this change, and instead continues to claim the old mechanism still works.

What the impact would be

In distributed tracing across real deployments, client and server processes are separate. An operator querying their tracing backend (e.g. Jaeger, Honeycomb) by mcp.session.id to correlate all activity for a given session would find server spans but miss every client span, making the trace incomplete and the session timeline unreconstructable. This is the core use case for session ID on spans.

How to fix it

Add a session_id: str | None = None parameter to build_client_span_attributes with the same conditional-set logic as build_server_span_attributes. Then thread the session ID from the transport into BaseSession.send_request — either by passing it as a parameter to send_request, or by making BaseSession aware of the transport's session ID via a callback or property on the write stream (the original approach, now removed).

Step-by-step proof

  1. src/mcp/shared/_otel.py lines 40-57: build_client_span_attributes signature is (*, method: str, request_id: str | int, params: dict[str, Any] | None = None) — no session_id parameter.
  2. src/mcp/shared/_otel.py lines 60-84: build_server_span_attributes has session_id: str | None = None and sets attributes["mcp.session.id"] = session_id when non-None.
  3. src/mcp/shared/session.py line 276: send_request calls build_client_span_attributes(method=..., request_id=..., params=...) with no session_id.
  4. Commit 67ca7fe removed _SessionAwareWriteStream (the previous client-side session ID mechanism). Commit 2c4b51a restored build_client_span_attributes without restoring the session_id parameter.
  5. The PR description's Testing section references 'resource_read_spans_include_session_id or streamable_http_client_resource_read', but no such test exists in tests/shared/test_streamable_http.py — confirming the client-side session ID surfacing was never validated after the wrapper removal.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant