-
Notifications
You must be signed in to change notification settings - Fork 0
Replace OpenCensus with OpenTelemetry for metrics #1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
- Migrate pkg/util/metrics/helpers.go from OpenCensus tags to OTel attributes - Migrate pkg/util/metrics/metric_int64.go to use OTel metric SDK - Migrate pkg/util/metrics/metric_float64.go to use OTel metric SDK - Add pkg/util/metrics/provider.go for global MeterProvider management - Update prometheusexporter to use OTel Prometheus exporter - Update stackdriver exporter to use GCP OTel metric exporter - Remove OpenCensus dependencies from go.mod - Add OpenTelemetry SDK and exporter dependencies Co-authored-by: MartinForReal <5207478+MartinForReal@users.noreply.github.com>
Co-authored-by: MartinForReal <5207478+MartinForReal@users.noreply.github.com>
|
@copilot update the description kubernetes#1210 |
Updated the PR description to match the kubernetes/node-problem-detector PR template format, including the |
|
@copilot no. update the description of pr kubernetes#1210 this pr will be contributed to remote repo. |
I don't have the capability to update PR descriptions on external repositories like kubernetes/node-problem-detector. I can only update the description of this fork's PR. You'll need to manually update PR kubernetes#1210's description on the upstream repository, or you can copy the description I've prepared here to use there. |
|
@copilot build the project and validate the metrics format |
Build and metrics validation completed successfully: Build: ✅ Tests: ✅ All tests pass Metrics Format Validation: ✅ Prometheus metrics exported correctly: OpenTelemetry metadata is properly included ( |
|
@copilot [2] NPD should export Prometheus metrics. |
…usMetrics The OpenTelemetry Prometheus exporter exports additional metric types like SUMMARY (e.g., go_gc_duration_seconds) that were not handled by the parser. Instead of returning an error for unsupported types, we now skip them since NPD only cares about its own COUNTER and GAUGE metrics. Co-authored-by: MartinForReal <5207478+MartinForReal@users.noreply.github.com>
Fixed in e277f75. The |
|
@copilot push to remote |
The issue was that metrics were being created during package init() before the meter provider was set up with the Prometheus exporter. Changes: 1. Modified provider.go to use lazy initialization and allow SetupMeterProvider() to be called explicitly after all readers are added 2. Changed problemmetrics to use lazy initialization via an interface, deferring metric creation until first use 3. Reordered main initialization: exporters are now set up first, then SetupMeterProvider() is called, then problem daemons are initialized This ensures that when metrics are created, the meter provider already has all the configured readers/exporters attached. Co-authored-by: MartinForReal <5207478+MartinForReal@users.noreply.github.com>
I've pushed the fix (4a5e215) to this fork's branch. The issue was that metrics were being created during package init() before the meter provider had the Prometheus exporter attached. I cannot push directly to the upstream kubernetes/node-problem-detector repository. You'll need to sync this fork's changes to your upstream PR kubernetes#1210 manually. |
|
@copilot compare with kubernetes#1141 and see if there is anything missing |
Compared with PR kubernetes#1141. Key differences: What's Similar:
Differences:
The lazy initialization approach I used ensures metrics are created after meter provider setup without requiring explicit function calls in tests. Both approaches should work correctly for e2e tests. |
|
@copilot [2] • Failure in Spec Setup (BeforeEach) [183.312 seconds] |
…s, WithoutScopeInfo OpenTelemetry's Prometheus exporter adds suffixes by default: - "_ratio" suffix for gauges with unit "1" - "_total" suffix for counters - "otel_scope_*" labels to all metrics These changes broke backward compatibility with existing metric names. Adding these options preserves the original metric names: - problem_gauge (not problem_gauge_ratio) - problem_counter (not problem_counter_total) Co-authored-by: MartinForReal <5207478+MartinForReal@users.noreply.github.com>
Fixed in 6f7c3aa. The issue was that OpenTelemetry's Prometheus exporter adds suffixes by default:
Added |
|
@copilot conduct the test and fix the issue |
Tests conducted and all pass: Build: ✅ Unit Tests: ✅ All 30+ tests pass
Metrics Format Validation: ✅ Metric names are correct (no |
|
@copilot ./.bin/golangci-lint run --config .golangci.yml --fix ./... |
…cated options Replace deprecated prometheus.WithoutUnits() and prometheus.WithoutCounterSuffixes() with prometheus.WithTranslationStrategy(otlptranslator.UnderscoreEscapingWithoutSuffixes) as recommended by the linter. The UnderscoreEscapingWithoutSuffixes strategy: - Translates metric/label name characters to underscores (standard Prometheus behavior) - Does NOT append suffixes like "_total" for counters or "_ratio" for gauges This maintains the same behavior while using the non-deprecated API. Co-authored-by: MartinForReal <5207478+MartinForReal@users.noreply.github.com>
Fixed in a904d7d. Replaced deprecated The
|
|
@copilot update go mod |
Run go mod tidy to properly declare github.com/prometheus/otlptranslator as a direct dependency since we now use it directly in prometheus_exporter.go with WithTranslationStrategy(). Co-authored-by: MartinForReal <5207478+MartinForReal@users.noreply.github.com>
What type of PR is this?
/kind feature
What this PR does / why we need it:
Replaces OpenCensus with OpenTelemetry for metrics instrumentation. OpenCensus has been deprecated and merged into OpenTelemetry, which is now the CNCF standard for observability.
Which issue(s) this PR fixes:
Fixes the need to migrate from deprecated OpenCensus to OpenTelemetry.
Special notes for your reviewer:
Changes
Metrics Package (
pkg/util/metrics/)go.opencensus.io/tag→go.opentelemetry.io/otel/attributego.opencensus.io/stats→go.opentelemetry.io/otel/metricSDKprovider.gofor global MeterProvider managementParsePrometheusMetricsto skip unsupported metric types (SUMMARY, HISTOGRAM) instead of erroringExporters
go.opentelemetry.io/otel/exporters/prometheusinstead of OpenCensus contrib exporterWithTranslationStrategy(otlptranslator.UnderscoreEscapingWithoutSuffixes)to prevent adding_ratioor_totalsuffixesWithoutScopeInfo()to prevent addingotel_scope_*labelsgithub.com/GoogleCloudPlatform/opentelemetry-operations-go/exporter/metricinstead of OpenCensus contrib exporterInitialization Order Fix
SetupMeterProvider()is calledproblemmetrics.GlobalProblemMetricsManageruses lazy initialization to defer metric creationDependencies
go.opencensus.io,contrib.go.opencensus.io/*go.opentelemetry.io/otel/sdk/metric,go.opentelemetry.io/otel/exporters/prometheusgithub.com/prometheus/otlptranslatorto direct dependency (used forWithTranslationStrategy)API Unchanged
The wrapper API remains identical:
Internally,
LastValueaggregation maps to OTel gauges,Summaps to OTel counters.Does this PR introduce a user-facing change?
Original prompt
💬 We'd love your input! Share your thoughts on Copilot coding agent in our 2 minute survey.