Skip to content

feat: add OpenTelemetry support across all SDKs#785

Draft
stephentoub wants to merge 2 commits intomainfrom
stoub/otel-support
Draft

feat: add OpenTelemetry support across all SDKs#785
stephentoub wants to merge 2 commits intomainfrom
stoub/otel-support

Conversation

@stephentoub
Copy link
Collaborator

Summary

Adds OpenTelemetry integration to all four language SDKs (Node.js, Python, Go, .NET), enabling distributed tracing between SDK consumers and the Copilot CLI.

What's included

TelemetryConfig type (all SDKs)

New configuration object on CopilotClientOptions that maps to CLI environment variables:

  • otlpEndpointOTEL_EXPORTER_OTLP_ENDPOINT
  • filePathCOPILOT_OTEL_FILE_EXPORTER_PATH
  • exporterTypeCOPILOT_OTEL_EXPORTER_TYPE
  • sourceNameCOPILOT_OTEL_SOURCE_NAME
  • captureContentOTEL_INSTRUMENTATION_GENAI_CAPTURE_MESSAGE_CONTENT

When provided, COPILOT_OTEL_ENABLED=true is set on the spawned CLI process.

W3C Trace Context propagation

traceparent/tracestate fields are now sent on:

  • session.create
  • session.resume
  • session.send

Trace context restoration in tool handlers

Both v2 RPC (tool.call) and v3 broadcast (tool.call.requested) tool call paths restore the inbound trace context before invoking user tool handlers, so tool execution is linked to the originating trace.

Telemetry helper modules

Each SDK has a new telemetry module (telemetry.ts, telemetry.py, telemetry.go, Telemetry.cs) with unit tests.

Updated generated types

Regenerated RPC and session-event types from the latest schema to include traceparent/tracestate fields.

Documentation

Added OpenTelemetry configuration docs and per-language README sections.

⚠️ Blocked: requires next CLI version

This PR depends on the Copilot CLI supporting the traceparent/tracestate fields in the RPC protocol. It will need to update to the next version of the CLI before this can move forward.

Known limitation

The Go ToolHandler type does not accept a context.Context parameter, so while trace context is restored around the handler call (for the HandlePendingToolCall RPC), it cannot be passed directly into user tool code. A comment has been added noting this; a future breaking change to the handler signature would fully resolve it.

stephentoub and others added 2 commits March 10, 2026 17:58
… documentation

Add telemetry documentation across all SDK docs:

- getting-started.md: New 'Telemetry & Observability' section with
  per-language examples, TelemetryConfig options table, file export
  example, and trace context propagation explanation
- Per-SDK READMEs (Node.js, Python, Go, .NET): Add telemetry option
  to constructor/options lists and new Telemetry sections with
  language-specific examples and dependency notes
- observability/opentelemetry.md: Add 'Built-in Telemetry Support'
  section at top with multi-language examples, options table,
  propagation details, and dependency matrix
- docs/index.md: Update Observability description

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Add TelemetryConfig to all four SDKs (Node, Python, Go, .NET) to configure
OpenTelemetry instrumentation on the Copilot CLI process. This includes:

- TelemetryConfig type with OTLP endpoint, file exporter, source name, and
  capture-content options, mapped to CLI environment variables
- W3C Trace Context propagation (traceparent/tracestate) on session.create,
  session.resume, and session.send RPC calls
- Trace context restoration in tool call handlers (v2 RPC and v3 broadcast)
  so user tool code executes within the correct distributed trace
- Telemetry helper modules (telemetry.ts, telemetry.py, telemetry.go,
  Telemetry.cs) with unit tests
- Updated generated types from latest schema

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@github-actions
Copy link
Contributor

Cross-SDK Consistency Review: OpenTelemetry Support

I've reviewed this PR for cross-SDK consistency across all four language implementations (Node.js, Python, Go, .NET). Overall, the implementation is very well done with excellent feature parity! 🎉

✅ Consistent Across All SDKs

The following features are implemented consistently across all languages:

  1. TelemetryConfig type - All SDKs define equivalent configuration with the same fields (accounting for naming conventions):

    • otlpEndpoint / otlp_endpoint / OTLPEndpoint / OtlpEndpoint
    • filePath / file_path / FilePath / FilePath
    • exporterType / exporter_type / ExporterType / ExporterType
    • sourceName / source_name / SourceName / SourceName
    • captureContent / capture_content / CaptureContent / CaptureContent
  2. Environment variable mapping - All SDKs correctly map config fields to CLI environment variables:

    • COPILOT_OTEL_ENABLED=true (when telemetry config is present)
    • OTEL_EXPORTER_OTLP_ENDPOINT
    • COPILOT_OTEL_FILE_EXPORTER_PATH
    • COPILOT_OTEL_EXPORTER_TYPE
    • COPILOT_OTEL_SOURCE_NAME
    • OTEL_INSTRUMENTATION_GENAI_CAPTURE_MESSAGE_CONTENT
  3. W3C Trace Context propagation - All SDKs send traceparent/tracestate on:

    • session.create
    • session.resume
    • session.send
  4. Telemetry helper modules - Each SDK has appropriate helper functions:

    • Node.js: telemetry.ts with getTraceContext() / withTraceContext()
    • Python: telemetry.py with get_trace_context() / trace_context() context manager
    • Go: telemetry.go with getTraceContext() / contextWithTraceParent()
    • .NET: Telemetry.cs with GetTraceContext() / RestoreTraceContext()
  5. Documentation - All four SDK READMEs have telemetry sections with consistent examples

  6. V3 broadcast event handling - All SDKs restore trace context before invoking tool handlers for tool.call.requested events

⚠️ Inconsistency Found: Go V2 Tool.Call Handler

There is one inconsistency in the Go SDK's v2 RPC handler (handleToolCallRequestV2):

Issue: The Go SDK receives traceparent and tracestate in the v2 tool.call request struct but does not restore the trace context before calling the tool handler.

Other SDKs for comparison:

  • Node.js (client.ts:1588-1592): Uses withTraceContext(traceparent, tracestate, () => handler(...))
  • Python (client.py:1646-1653): Uses with trace_context(tp, ts): result = handler(...)
  • .NET (Client.cs:1346): Uses using var _ = TelemetryHelpers.RestoreTraceContext(traceparent, tracestate);
  • Go (client.go:1519-1526): Calls handler(invocation) directly without context restoration

Why this matters: While the Go ToolHandler signature doesn't accept context.Context (which is documented as a known limitation), the other SDKs still restore the ambient trace context even when they can't pass it directly to the handler. This allows any OpenTelemetry instrumentation that reads from the ambient context to work correctly.

Suggested fix: In go/client.go around line 1519, before calling the handler:

invocation := ToolInvocation{
    SessionID:  req.SessionID,
    ToolCallID: req.ToolCallID,
    ToolName:   req.ToolName,
    Arguments:  req.Arguments,
}

// Restore trace context around handler invocation (even though we can't pass ctx to handler)
ctx := contextWithTraceParent(context.Background(), req.Traceparent, req.Tracestate)
// Since ToolHandler doesn't accept context, we can't pass ctx directly, but
// if the handler uses otel.GetTextMapPropagator() or runtime.Callers, the
// restored context would be available if we could set it as goroutine-local.
// For now, this is documented as a limitation.
result, err := handler(invocation)

However, since Go doesn't have goroutine-local storage and the ToolHandler signature doesn't accept context.Context, there may be limited benefit to this change. The current approach is acceptable given the documented limitation, but it would be more consistent with the other SDKs to at least create the context (even if unused) to show the intent.

Summary

This is an excellent, consistent implementation across all SDKs! The only minor inconsistency is in the Go v2 handler's trace context restoration, which is somewhat mitigated by the documented limitation around Go's ToolHandler signature. The feature parity, API design, and documentation are all outstanding. Great work! 🚀

Generated by SDK Consistency Review Agent for issue #785 ·

Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generated by SDK Consistency Review Agent for issue #785

Arguments: req.Arguments,
}

result, err := handler(invocation)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consistency suggestion: For consistency with the other SDK implementations, consider restoring the trace context before calling the handler here:

// Restore trace context (even though ToolHandler can't receive it)
ctx := contextWithTraceParent(context.Background(), req.Traceparent, req.Tracestate)
// Note: ToolHandler signature doesn't accept context.Context, so any spans created
// by the handler won't be automatically parented unless the handler manually propagates context.
result, err := handler(invocation)

The other SDKs all restore trace context in their v2 handlers:

  • Node.js uses withTraceContext(traceparent, tracestate, () => handler(...))
  • Python uses with trace_context(tp, ts): result = handler(...)
  • .NET uses using var _ = RestoreTraceContext(traceparent, tracestate);

While Go's limitation is well-documented in the README, adding the context restoration (even if unused) would make the intent clearer and align with the other implementations.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant