Skip to content

Conversation

@MartinForReal
Copy link
Contributor

@MartinForReal MartinForReal commented Dec 25, 2025

What type of PR is this?

/kind feature

What this PR does / why we need it:

Replaces OpenCensus with OpenTelemetry for metrics instrumentation. OpenCensus has been deprecated and merged into OpenTelemetry, which is now the CNCF standard for observability.

Which issue(s) this PR fixes:

fixes: #1008
Fixes the need to migrate from deprecated OpenCensus to OpenTelemetry.

Special notes for your reviewer:

Changes

Metrics Package (pkg/util/metrics/)

  • Replace go.opencensus.io/taggo.opentelemetry.io/otel/attribute
  • Replace go.opencensus.io/statsgo.opentelemetry.io/otel/metric SDK
  • Add provider.go for global MeterProvider management

Exporters

  • Prometheus: Use go.opentelemetry.io/otel/exporters/prometheus instead of OpenCensus contrib exporter
  • Stackdriver: Use github.com/GoogleCloudPlatform/opentelemetry-operations-go/exporter/metric instead of OpenCensus contrib exporter

Dependencies

  • Remove: go.opencensus.io, contrib.go.opencensus.io/*
  • Add: go.opentelemetry.io/otel/sdk/metric, go.opentelemetry.io/otel/exporters/prometheus

API Unchanged

The wrapper API remains identical:

// Same API, different implementation
metric, _ := metrics.NewInt64Metric(id, "name", "desc", "unit", metrics.Sum, []string{"tag1"})
metric.Record(map[string]string{"tag1": "value"}, 42)

Internally, LastValue aggregation maps to OTel gauges, Sum maps to OTel counters.

Does this PR introduce a user-facing change?

Replace OpenCensus with OpenTelemetry for metrics collection and export. No changes to the metrics API.

Copilot AI and others added 3 commits December 25, 2025 03:27
- Migrate pkg/util/metrics/helpers.go from OpenCensus tags to OTel attributes
- Migrate pkg/util/metrics/metric_int64.go to use OTel metric SDK
- Migrate pkg/util/metrics/metric_float64.go to use OTel metric SDK
- Add pkg/util/metrics/provider.go for global MeterProvider management
- Update prometheusexporter to use OTel Prometheus exporter
- Update stackdriver exporter to use GCP OTel metric exporter
- Remove OpenCensus dependencies from go.mod
- Add OpenTelemetry SDK and exporter dependencies

Co-authored-by: MartinForReal <5207478+MartinForReal@users.noreply.github.com>
Co-authored-by: MartinForReal <5207478+MartinForReal@users.noreply.github.com>
@k8s-ci-robot
Copy link
Contributor

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@k8s-ci-robot k8s-ci-robot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Dec 25, 2025
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: MartinForReal
Once this PR has been reviewed and has the lgtm label, please assign yujuhong for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Dec 25, 2025
@MartinForReal MartinForReal changed the title Adopt opentelemetry Feat: Adopt opentelemetry Dec 25, 2025
@MartinForReal MartinForReal changed the title Feat: Adopt opentelemetry feat(exporter): Adopt opentelemetry Dec 25, 2025
@MartinForReal MartinForReal marked this pull request as ready for review December 25, 2025 05:19
@k8s-ci-robot k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Dec 25, 2025
@k8s-ci-robot k8s-ci-robot requested a review from hakman December 25, 2025 05:19
Copilot AI and others added 5 commits December 25, 2025 10:42
…usMetrics

The OpenTelemetry Prometheus exporter exports additional metric types like
SUMMARY (e.g., go_gc_duration_seconds) that were not handled by the parser.
Instead of returning an error for unsupported types, we now skip them since
NPD only cares about its own COUNTER and GAUGE metrics.

Co-authored-by: MartinForReal <5207478+MartinForReal@users.noreply.github.com>
The issue was that metrics were being created during package init() before
the meter provider was set up with the Prometheus exporter.

Changes:
1. Modified provider.go to use lazy initialization and allow SetupMeterProvider()
   to be called explicitly after all readers are added
2. Changed problemmetrics to use lazy initialization via an interface, deferring
   metric creation until first use
3. Reordered main initialization: exporters are now set up first, then
   SetupMeterProvider() is called, then problem daemons are initialized

This ensures that when metrics are created, the meter provider already has
all the configured readers/exporters attached.

Co-authored-by: MartinForReal <5207478+MartinForReal@users.noreply.github.com>
…s, WithoutScopeInfo

OpenTelemetry's Prometheus exporter adds suffixes by default:
- "_ratio" suffix for gauges with unit "1"
- "_total" suffix for counters
- "otel_scope_*" labels to all metrics

These changes broke backward compatibility with existing metric names.
Adding these options preserves the original metric names:
- problem_gauge (not problem_gauge_ratio)
- problem_counter (not problem_counter_total)

Co-authored-by: MartinForReal <5207478+MartinForReal@users.noreply.github.com>
…cated options

Replace deprecated prometheus.WithoutUnits() and prometheus.WithoutCounterSuffixes()
with prometheus.WithTranslationStrategy(otlptranslator.UnderscoreEscapingWithoutSuffixes)
as recommended by the linter.

The UnderscoreEscapingWithoutSuffixes strategy:
- Translates metric/label name characters to underscores (standard Prometheus behavior)
- Does NOT append suffixes like "_total" for counters or "_ratio" for gauges

This maintains the same behavior while using the non-deprecated API.

Co-authored-by: MartinForReal <5207478+MartinForReal@users.noreply.github.com>
Run go mod tidy to properly declare github.com/prometheus/otlptranslator
as a direct dependency since we now use it directly in prometheus_exporter.go
with WithTranslationStrategy().

Co-authored-by: MartinForReal <5207478+MartinForReal@users.noreply.github.com>
@MartinForReal
Copy link
Contributor Author

/assign @yujuhong

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Migrate OpenCensus to OpenTelemetry

3 participants