Skip to content

Support MCP observability for Envoy AI Gateway#13791

Open
wu-sheng wants to merge 9 commits intomasterfrom
feature/mcp-observability
Open

Support MCP observability for Envoy AI Gateway#13791
wu-sheng wants to merge 9 commits intomasterfrom
feature/mcp-observability

Conversation

@wu-sheng
Copy link
Copy Markdown
Member

@wu-sheng wu-sheng commented Apr 4, 2026

Support MCP (Model Context Protocol) observability for Envoy AI Gateway

  • If this is non-trivial feature, paste the links/URLs to the design doc.
  • Update the documentation to include this new feature.
  • Tests(including UT, IT, E2E) are added to verify the new feature.
  • If it's UI related, attach the screenshots below.

Changes

MAL rules (new files):

  • gateway-mcp-service.yaml — 13 MCP service-level metrics (request CPM/latency/percentile, method CPM, error CPM, initialization latency, capabilities, per-backend breakdown)
  • gateway-mcp-instance.yaml — 13 MCP instance-level metrics

LAL rules (modified envoy-ai-gateway.yaml):

  • Split into two rules: envoy-ai-gateway-llm-access-log and envoy-ai-gateway-mcp-access-log
  • LLM logs: persist error responses (>= 400) and upstream failures only
  • MCP logs: persist error responses (>= 400) only
  • Both rules tag ai_route_type (llm or mcp) for searchable filtering

Dashboard (modified service + instance JSON):

  • Added MCP tab with 9 widgets (service) / 6 widgets (instance): request CPM, latency avg/percentile, error CPM, method CPM, initialization latency, backend breakdown

E2E test (modified):

  • Added mcp-server service (tzolov/mcp-everything-server:v3 — MCP reference server with StreamableHttp)
  • Added MCP request steps (initialize + tools/list + tools/call)
  • Added MCP metric verification cases
  • Log query uses ai_route_type=llm tag filter

Config:

  • Added ai_route_type to searchableLogsTags in application.yml

  • Fixed aigw healthcheck binary path (/app instead of aigw)

  • If this pull request closes/resolves/fixes an existing issue, replace the issue number. Closes #.

  • Update the CHANGES log.

@wu-sheng wu-sheng added backend OAP backend related. enhancement Enhancement on performance or codes labels Apr 4, 2026
@wu-sheng wu-sheng added this to the 10.5.0 milestone Apr 4, 2026
@wu-sheng wu-sheng requested review from Copilot and wankai123 April 4, 2026 23:51
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds MCP (Model Context Protocol) observability support for Envoy AI Gateway in SkyWalking, extending the existing Envoy AI Gateway (SWIP-10) integration with MCP metrics, dashboards, log tagging/sampling, and E2E verification.

Changes:

  • Add new MAL rules to derive MCP service/instance metrics (aggregate, per-method, per-backend, init latency, capabilities).
  • Split LAL into LLM vs MCP log rules and tag logs with ai_route_type for filtering; update default searchable log tags.
  • Extend dashboards + docs + E2E docker-compose scenario to include an MCP backend and MCP metric verification.

Reviewed changes

Copilot reviewed 12 out of 12 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
test/e2e-v2/cases/envoy-ai-gateway/expected/logs.yml Updates expected log tags to include ai_route_type=llm.
test/e2e-v2/cases/envoy-ai-gateway/envoy-ai-gateway-cases.yaml Adds MCP metric query cases; filters log query by ai_route_type=llm.
test/e2e-v2/cases/envoy-ai-gateway/e2e.yaml Adds MCP request steps (initialize/tools) for metric verification.
test/e2e-v2/cases/envoy-ai-gateway/docker-compose.yml Adds mcp-server service and configures ai-gateway-cli MCP routing; fixes healthcheck path.
oap-server/server-starter/src/main/resources/ui-initialized-templates/envoy_ai_gateway/envoy-ai-gateway-service.json Adds “MCP” tab/widgets to the service dashboard template.
oap-server/server-starter/src/main/resources/ui-initialized-templates/envoy_ai_gateway/envoy-ai-gateway-instance.json Adds “MCP” tab/widgets to the instance dashboard template.
oap-server/server-starter/src/main/resources/otel-rules/envoy-ai-gateway/gateway-mcp-service.yaml New MAL rules for MCP service-level metrics.
oap-server/server-starter/src/main/resources/otel-rules/envoy-ai-gateway/gateway-mcp-instance.yaml New MAL rules for MCP instance-level metrics.
oap-server/server-starter/src/main/resources/lal/envoy-ai-gateway.yaml Splits access-log processing into LLM vs MCP rules; adds ai_route_type.
oap-server/server-starter/src/main/resources/application.yml Adds ai_route_type to default searchableLogsTags.
docs/en/setup/backend/backend-envoy-ai-gateway-monitoring.md Documents MCP metrics, dashboards, and log filtering/sampling behavior.
docs/en/changes/changes.md Adds a CHANGES entry for MCP observability support.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 13 out of 13 changed files in this pull request and generated 2 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +205 to +207
"type": "Line",
"showXAxis": true,
"showYAxis": true
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@peachisai I updated this part, as the Bar seems not correct visualization for percentile.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backend OAP backend related. enhancement Enhancement on performance or codes

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants