Skip to content

Conversation

@andywgarcia
Copy link
Contributor

This PR adds a script for setting up tracing and metrics telemetry via an OTEL collector that exports for prometheus to scrape from the collector with a grafana dashboard.

The client ui, router, coprocessor, and subgraphs should all be submitting trace information. The router, coprocessor, and subgraphs should be submitting metric information to be viewed on the grafana dashboard.

The dashboard itself is an early version of the grafana dashboard that I will submit as the updated template in the apm-templates repo for grafana and is based on the Datadog dashboard that was announced at Summit

Here is a visual of that dashboard

localhost_3002_d_graphos-runtime-dashboard_graphos-runtime-dashboard-template_orgId=1 from=now-15m to=now timezone=browser var-datasource=PBFA97CFB590B2093 var-job_name=otel-collector var-otel_scope_name=apollo%2Frouter (4)

… and error handling. Ensure script is run from repository root and provide detailed instructions for accessing Zipkin UI.
- Integrated OpenTelemetry SDK for browser tracing in the client application.
- Added necessary dependencies for OpenTelemetry in package.json and package-lock.json.
- Updated Dockerfile to include OTEL_COLLECTOR_URL as an environment variable.
- Enhanced deployment scripts to set up port-forwarding for the OpenTelemetry collector.
- Updated collector image version in values.yaml to 0.141.0.
- Improved CORS configuration in the collector's configmap for local development.
- Modified expressions in the dashboard to improve metric aggregation for subgraphs and HTTP requests.
- Updated titles and legend formats for clarity in visualizations.
- Enhanced README with dashboard requirements, usage instructions, and known limitations regarding HTTP status codes in subgraph metrics.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to build our own dashboard or reference the APM project and link it here.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I commented earlier. I much rather look at updating the APM platform and pulling the dashboard from theere

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a general comment we could look at substituting these value with labels and deployments

@andywgarcia andywgarcia merged commit 2ab18ea into main Jan 7, 2026
1 check passed
@andywgarcia andywgarcia deleted the garcia/operator-with-telemetry branch January 7, 2026 17:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants