feat(ARCH-341): add OpenTelemetry tracing and reduce INFO logging#41
Merged
feat(ARCH-341): add OpenTelemetry tracing and reduce INFO logging#41
Conversation
- Add internal/otel/tracing.go: SetupSDK initialises OTLP trace/metric/log providers from standard OTEL_* env vars; installs no-op provider when OTEL_TRACES_EXPORTER is unset (zero cost in stdio/local-dev). - Wire slog-otel OtelHandler around slog.NewJSONHandler in main() so every log record emitted within an active span carries trace_id and span_id. Datadog log-to-trace correlation reads these standard OTEL hex fields directly (no dd.trace_id/dd.span_id conversion needed). - Initialise OTel SDK in main() with graceful shutdown defer. - Wrap inbound HTTP mux with otelhttp.NewHandler for root span per request with W3C TraceContext propagation. - Wrap outbound LFX API and token-exchange HTTP clients with otelhttp.NewTransport so upstream calls appear as child spans. - Demote mcpLoggingMiddleware success completions from INFO to DEBUG; these are observable via APM spans and INFO violates ADR-0002. - Convert all logger.Warn partial-result calls to in-band TextContent warnings prepended to the tool response Content slice (get_project, get_committee, get_mailing_list_service, get_mailing_list, and all search-page truncation warnings). Server-side Warn/Error retained for operator observability. - Add HTTPClient field to all tool config structs and thread the shared otelhttp-wrapped client through to lfxv2.ClientConfig.HTTPClient. - Helm chart: add podAnnotations pass-through, app.otel stanza (endpoint/insecure/tracesExporter/tracesSampleRatio), and app.extraEnv pass-through to deployment.yaml and values.yaml. - Bump go.mod: add slog-otel, otelhttp, otel SDK/exporters/propagators; bump lfx-v2-committee-service to v0.2.23. 🤖 Generated with [GitHub Copilot](https://github.com/features/copilot) (via Zed) Signed-off-by: Eric Searcy <eric@linuxfoundation.org>
There was a problem hiding this comment.
Claude Code Review
This repository is configured for manual code reviews. Comment @claude review to trigger a review.
Tip: disable this comment in your organization's Code Review settings.
Contributor
There was a problem hiding this comment.
Pull request overview
Adds OpenTelemetry (OTLP) tracing/export setup and log-to-trace correlation, wires OTel HTTP instrumentation into inbound/outbound requests, and reduces INFO-level noise by moving partial-result warnings into tool responses and demoting routine logs.
Changes:
- Add
internal/otelSDK bootstrap (env-driven OTLP exporters, propagators, resource config) and initialize it frommain. - Instrument HTTP server and shared outbound
http.Clienttransports withotelhttp, and wrap JSON logging withslog-otelfortrace_id/span_id. - Helm chart: add pod annotations passthrough and OTEL env var stanzas; tools now accept an injected
HTTPClient.
Reviewed changes
Copilot reviewed 12 out of 13 changed files in this pull request and generated 16 comments.
Show a summary per file
| File | Description |
|---|---|
internal/otel/tracing.go |
New OTEL SDK bootstrap from env vars (trace/metric/log providers, propagators, endpoint normalization). |
cmd/lfx-mcp-server/main.go |
Wrap slog handler for trace correlation, initialize/shutdown OTEL, instrument inbound/outbound HTTP, demote success logging to DEBUG. |
internal/tools/project.go |
Inject shared HTTP client into LFX client config; return partial-result warning in tool response content. |
internal/tools/committee.go |
Inject shared HTTP client; prepend in-band warnings for missing settings and page truncation cases. |
internal/tools/committee_write.go |
Pass shared HTTP client into LFX client construction for write paths. |
internal/tools/mailing_list.go |
Inject shared HTTP client; prepend in-band warnings for missing settings and page truncation cases. |
internal/tools/member.go |
Add HTTP client to tool config and pass through to LFX clients. |
internal/tools/member_write.go |
Pass shared HTTP client into LFX client construction for write paths. |
internal/tools/meeting.go |
Inject shared HTTP client; prepend in-band page truncation warnings for search tools. |
charts/lfx-mcp/values.yaml |
Add podAnnotations, app.otel, and app.extraEnv values. |
charts/lfx-mcp/templates/deployment.yaml |
Render pod annotations; add OTEL-related env vars and extra env passthrough. |
go.mod |
Add OTEL + slog-otel dependencies; bump committee service SDK. |
go.sum |
Add checksums for new dependencies and bumped modules. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
The previous defer otelShutdown(context.Background()) had two problems: - os.Exit(1) calls inside runHTTPServer would skip the defer entirely, losing any buffered spans. - It used an unbounded context.Background() rather than the same 10-second shutdown context already in scope. Pass otelShutdown into runHTTPServer and runStdioServer. In both functions call it explicitly after httpServer.Shutdown / server.Run returns, using the existing bounded context so the flush honours the same timeout as HTTP draining. 🤖 Generated with [GitHub Copilot](https://github.com/features/copilot) (via Zed) Signed-off-by: Eric Searcy <eric@linuxfoundation.org>
Settings unavailability is a real error (network failure, decode failure, unexpected server error) not merely a permissions notice. Operator logs should reflect that. The in-band WARNING TextContent in the tool response already informs the caller; the server-side log level should be Error for operator alerting. 🤖 Generated with [GitHub Copilot](https://github.com/features/copilot) (via Zed) Signed-off-by: Eric Searcy <eric@linuxfoundation.org>
…ogs and spans Four fixes in this commit: 1. Context logger propagation: store a child logger with mcp.session.id and mcp.method.name pre-bound into ctx via tools.WithLogger so every newToolLogger(ctx, req) call inherits both fields on all log records, not just the two completion lines in mcpLoggingMiddleware. 2. Span attributes: use MCP semconv per the OTel MCP spec (https://opentelemetry.io/docs/specs/semconv/gen-ai/mcp/) instead of RPC semconv — semconv.McpMethodNameKey.String(method) for mcp.method.name and semconv.McpSessionID(sessionID) for mcp.session.id. Drop the attribute and RPCSystem imports that are no longer needed. 3. Log field names: align with semconv — "mcp.session.id" and "mcp.method.name" instead of "session_id"/"rpc.method". 4. OTel context injection: convert every logger.Xxx call in every tool file to logger.XxxContext(ctx, ...) so OtelHandler.Handle receives the span context and injects trace_id/span_id into each record. Also name hello_world's previously-blank context parameter. 🤖 Generated with [GitHub Copilot](https://github.com/features/copilot) (via Zed) Signed-off-by: Eric Searcy <eric@linuxfoundation.org>
Rename mcpLoggingMiddleware → mcpOTelMiddleware to reflect that the function now handles both OTel instrumentation and structured logging. Add the following span attributes per the OpenTelemetry MCP semantic conventions (https://opentelemetry.io/docs/specs/semconv/gen-ai/mcp/): - network.transport: "pipe" for stdio, "tcp" for HTTP (detected via presence of HTTP headers in RequestExtra). - mcp.protocol.version: negotiated version from the client's initialize params (Recommended). - gen_ai.tool.name + gen_ai.operation.name = "execute_tool": set on tools/call spans (Conditionally Required / Recommended). - error.type: set to the error string on JSON-RPC-level failures, or to "tool_error" when a tools/call result carries IsError = true (Conditionally Required). 🤖 Generated with [GitHub Copilot](https://github.com/features/copilot) (via Zed) Signed-off-by: Eric Searcy <eric@linuxfoundation.org>
The SDK wraps regular tool errors into IsError=true results; only *jsonrpc.Error (protocol-level failures such as unknown tool name) surfaces as a non-nil err from the MethodHandler. Update the comment to reflect this accurately. 🤖 Generated with [GitHub Copilot](https://github.com/features/copilot) (via Zed) Signed-off-by: Eric Searcy <eric@linuxfoundation.org>
bramwelt
requested changes
Mar 25, 2026
- Use spec-correct OTEL_EXPORTER_OTLP_PROTOCOL values: replace the
non-spec OTelProtocolHTTP ("http") constant with OTelProtocolHTTPProtobuf
("http/protobuf") and OTelProtocolHTTPJSON ("http/json"). Accept
"http/json" with a warning and fall back to "http/protobuf" since the
Go SDK does not implement it. Reject other unknown values.
- Wrap TraceIDRatioBased sampler with trace.ParentBased so incoming
W3C TraceContext sampling decisions are respected across service
boundaries.
- Drop OTEL_EXPORTER_OTLP_INSECURE entirely: it is not part of the
standard OTEL SDK configuration spec. WithEndpointURL already derives
TLS vs. plaintext from the URL scheme (Insecure = scheme != "https"),
so passing http:// or https:// in the endpoint is sufficient.
endpointURL() now always prepends http:// to bare host:port values;
users requiring TLS supply an explicit https:// scheme. Remove the
Insecure field from Config, all WithInsecure() calls from provider
constructors, the OTEL_EXPORTER_OTLP_INSECURE env block from the Helm
chart, and the insecure key from values.yaml.
- Validate OTEL_TRACES_EXPORTER, OTEL_METRICS_EXPORTER,
OTEL_LOGS_EXPORTER: only "otlp" and "none" are supported; any other
value (e.g. "jaeger", "zipkin") emits a warning and defaults to
"none". Extracted into normaliseExporter() helper.
- Add tools.ParseEnvBool to internal/tools/helpers.go: accepts "true",
"t", "1", "y", "yes" (case-insensitive), consistent with other boolean
env var parsing in the codebase. Remove the now-unused parseEnvBool
from tracing.go.
🤖 Generated with [GitHub Copilot](https://github.com/features/copilot) (via Zed)
Signed-off-by: Eric Searcy <eric@linuxfoundation.org>
ParseEnvBool was added in anticipation of use in tracing.go, but the OTEL_EXPORTER_OTLP_INSECURE env var was removed entirely rather than parsed. No callers exist; remove the function and the now-unused strings import. 🤖 Generated with [GitHub Copilot](https://github.com/features/copilot) (via Zed) Signed-off-by: Eric Searcy <eric@linuxfoundation.org>
- Move extraEnv block before OTEL env vars in deployment.yaml so that HOST_IP (injected via extraEnv from pod status.hostIP) is declared before OTEL_EXPORTER_OTLP_ENDPOINT, which references $(HOST_IP). Kubernetes resolves $(VAR_NAME) in env declaration order; placing extraEnv after the OTEL block left $(HOST_IP) as a literal string. - Replace non-standard OTEL_TRACES_SAMPLE_RATIO with the spec-correct OTEL_TRACES_SAMPLER (enum) and OTEL_TRACES_SAMPLER_ARG (ratio). Add newSampler() which constructs the appropriate trace.Sampler from the configured OTEL_TRACES_SAMPLER value, supporting all standard sampler names: always_on, always_off, traceidratio, parentbased_always_on, parentbased_always_off, parentbased_traceidratio (default). Unrecognised values warn and fall back to parentbased_traceidratio. Rename TracesSampleRatio -> TracesSamplerArg in Config. Update Helm chart values keys and deployment template accordingly. 🤖 Generated with [GitHub Copilot](https://github.com/features/copilot) (via Zed) Signed-off-by: Eric Searcy <eric@linuxfoundation.org>
bramwelt
approved these changes
Mar 25, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Implements ARCH-341: OTel instrumentation for the LFX MCP server.
Changes
internal/otel/tracing.go(new)SetupSDKbootstraps OTLP trace/metric/log providers from standardOTEL_*env vars.OTEL_TRACES_EXPORTERis unset — zero overhead in stdio/local-dev.endpointURLhelper normalises barehost:portvalues tohttp://host:portbefore passing to the SDK.slog.Error.cmd/lfx-mcp-server/main.goslog.NewJSONHandlerwithslogotel.OtelHandlerso every log record within an active span carriestrace_idandspan_id(standard OTel hex fields; Datadog log-to-trace correlation uses a remapper rule on these directly — nodd.*conversion needed).defershutdown.otelhttp.NewHandlerfor a root span + W3C TraceContext propagation per request.http.Clienttransports withotelhttp.NewTransportso upstream calls appear as child spans.mcpLoggingMiddleware→mcpOTelMiddlewareto reflect its expanded role.mcpOTelMiddleware:mcp.method.name,mcp.session.id— on every call.network.transport—"pipe"for stdio,"tcp"for HTTP (detected viaRequestExtra.Header).mcp.protocol.version— from the client'sinitializeparams.gen_ai.tool.name+gen_ai.operation.name = "execute_tool"— ontools/callspans.error.type— error string on JSON-RPC-level failures;"tool_error"when a tool returnsIsError = true.mcpOTelMiddlewaresuccessful completions fromInfo→Debug(reads are observable via APM spans, not log lines).internal/tools/*.goHTTPClient *http.Clientfield to all tool config structs (ProjectConfig,CommitteeConfig,MailingListConfig,MemberConfig,MeetingConfig) and thread the shared otelhttp-wrapped client through tolfxv2.ClientConfig.HTTPClient.logger.Warncalls to in-bandTextContentwarnings prepended to the tool responseContentslice:get_project: settings unavailable warning.get_committee: settings unavailable warning.get_mailing_list_service/get_mailing_list: settings unavailable warnings.search_committees,search_committee_members,search_mailing_lists,search_mailing_list_members,search_meetings,search_meeting_registrants, search past meeting resources): page-truncation warnings.Warn/Errorretained for operator observability.Helm chart (
charts/lfx-mcp/)deployment.yaml: addpodAnnotationspass-through,app.otelenv stanza (OTEL_EXPORTER_OTLP_ENDPOINT,OTEL_EXPORTER_OTLP_INSECURE,OTEL_TRACES_EXPORTER,OTEL_TRACES_SAMPLE_RATIO), andapp.extraEnvpass-through.values.yaml: addpodAnnotations: {},app.otelstanza with empty defaults, andapp.extraEnv: [].go.modslog-otel,otelhttp, full OTel SDK + OTLP exporters, jaeger propagator.lfx-v2-committee-service→ v0.2.23.ArgoCD values
See companion draft PR in lfx-v2-argocd (ARCH-341) for
HOST_IPextraEnv,otelendpoint, and Datadog pod log annotations.Jira
ARCH-341