Skip to content

OTA-1892: pkg/cvo/metrics: Serve cluster_version_available_updates when channel is set#1351

Open
wking wants to merge 1 commit intoopenshift:mainfrom
wking:more-cluster_version_available_updates
Open

OTA-1892: pkg/cvo/metrics: Serve cluster_version_available_updates when channel is set#1351
wking wants to merge 1 commit intoopenshift:mainfrom
wking:more-cluster_version_available_updates

Conversation

@wking
Copy link
Member

@wking wking commented Mar 16, 2026

Since 8b91189 (#45), cluster_version_available_updates is only served when there are unconditionally-recommended updates and retrieval is succeeding. This makes it hard to understand at the fleet level when:

cluster_operator_conditions{name="version", condition="RetrievedUpdates", reason="VersionNotFound"} == 0

is because of a misconfigured channel, or a misbehaving Update Service, or otherwise. We should always export
cluster_version_available_updates whenever a channel is set, to make it easier to isolate the “because the cluster-admin has somehow selected a channel not compatible with their current version” (which we can’t do much about other than keep serving our existing CannotRetrieveUpdates alert) from the other possibilities (which we might be able to do something about).

@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Mar 16, 2026
@openshift-ci-robot
Copy link
Contributor

openshift-ci-robot commented Mar 16, 2026

@wking: This pull request references OTA-1892 which is a valid jira issue.

Details

In response to this:

Since 8b91189 (#45), cluster_version_available_updates is only served when there are unconditionally-recommended updates and retrieval is succeeding. This makes it hard to understand at the fleet level when:

cluster_operator_conditions{name="version", condition="RetrievedUpdates", reason="VersionNotFound"} == 0

is because of a misconfigured channel, or a misbehaving Update Service, or otherwise. We should always export
cluster_version_available_updates whenever a channel is set, to make it easier to isolate the “because the cluster-admin has somehow selected a channel not compatible with their current version” (which we can’t do much about other than keep serving our existing CannotRetrieveUpdates alert) from the other possibilities (which we might be able to do something about).

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@coderabbitai
Copy link

coderabbitai bot commented Mar 16, 2026

Walkthrough

Gates emission of available-updates metrics on whether cv.Spec.Channel is non-empty; upstream resolution logic unchanged. Test expectations updated to use "test-channel" for channel labeling.

Changes

Cohort / File(s) Summary
Metrics Condition Update
pkg/cvo/metrics.go
Replaced previous gating (upstream presence, available updates, RetrievedUpdates) with a single check: emit available-updates metrics only when cv.Spec.Channel is non-empty. Upstream resolution still prefers optr.updateService then cv.Spec.Upstream, defaulting to "<default>".
Tests Updated for Channel Labeling
pkg/cvo/metrics_test.go
Updated tests to set ClusterVersionSpec.Channel = "test-channel" across test clusters and to assert metrics carry the channel="test-channel" label instead of an empty channel.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
📝 Coding Plan
  • Generate coding plan for human review comments

Comment @coderabbitai help to get the list of available commands and usage tips.

Tip

CodeRabbit can generate a title for your PR based on the changes with custom instructions.

Set the reviews.auto_title_instructions setting to generate a title for your PR based on the changes in the PR with custom instructions.

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Mar 16, 2026

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: wking

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Mar 16, 2026
… is set

Since 8b91189 (cvo: Add prometheus metrics to the CVO for current
update state, 2018-11-05, openshift#45), cluster_version_available_updates is
only served when there are unconditionally-recommended updates and
retrieval is succeeding.  This makes it hard to understand at the
fleet level when:

  cluster_operator_conditions{name="version", condition="RetrievedUpdates", reason="VersionNotFound"} == 0

is because of a misconfigured channel, or a misbehaving Update
Service, or otherwise.  We should always export
cluster_version_available_updates whenever a channel is set, to make
it easier to isolate the “because the cluster-admin has somehow
selected a channel not compatible with their current version” (which
we can’t do much about other than keep serving our existing
CannotRetrieveUpdates alert) from the other possibilities (which we
might be able to do something about).
@wking wking force-pushed the more-cluster_version_available_updates branch from 688bf45 to 4702f81 Compare March 17, 2026 15:41
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
pkg/cvo/metrics_test.go (1)

359-427: Add one negative regression case for empty channel.

Please add a test where Status.AvailableUpdates is non-empty but Spec.Channel is empty, and assert cluster_version_available_updates is not emitted. That will lock in the new gate behavior from both sides.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/cvo/metrics_test.go` around lines 359 - 427, Add a new table-driven test
case in pkg/cvo/metrics_test.go similar to the "collects available updates" case
but set the ClusterVersion.Spec.Channel to "" while Status.AvailableUpdates
contains entries; use the same Operator construct (optr with cvLister and a
ClusterVersion object) and in the wants closure assert that the prometheus
metric "cluster_version_available_updates" (the one previously checked via
expectMetric with labels {"upstream": "<default>", "channel": "test-channel"})
is not emitted — i.e., scan the metrics slice and fail if any metric has labels
"upstream" and "channel" populated (or specifically ensure no metric exists with
channel == ""), keeping the rest of expected metrics the same as the
non-empty-updates path; reference Operator, cvLister,
ClusterVersion.Spec.Channel, and ClusterVersion.Status.AvailableUpdates to
locate where to change/add the test case.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@pkg/cvo/metrics_test.go`:
- Around line 359-427: Add a new table-driven test case in
pkg/cvo/metrics_test.go similar to the "collects available updates" case but set
the ClusterVersion.Spec.Channel to "" while Status.AvailableUpdates contains
entries; use the same Operator construct (optr with cvLister and a
ClusterVersion object) and in the wants closure assert that the prometheus
metric "cluster_version_available_updates" (the one previously checked via
expectMetric with labels {"upstream": "<default>", "channel": "test-channel"})
is not emitted — i.e., scan the metrics slice and fail if any metric has labels
"upstream" and "channel" populated (or specifically ensure no metric exists with
channel == ""), keeping the rest of expected metrics the same as the
non-empty-updates path; reference Operator, cvLister,
ClusterVersion.Spec.Channel, and ClusterVersion.Status.AvailableUpdates to
locate where to change/add the test case.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 74526213-430d-49d7-831d-3d518d273c44

📥 Commits

Reviewing files that changed from the base of the PR and between 688bf45 and 4702f81.

📒 Files selected for processing (2)
  • pkg/cvo/metrics.go
  • pkg/cvo/metrics_test.go

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Mar 17, 2026

@wking: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/e2e-hypershift 4702f81 link true /test e2e-hypershift
ci/prow/e2e-agnostic-ovn-techpreview-serial-2of3 4702f81 link true /test e2e-agnostic-ovn-techpreview-serial-2of3
ci/prow/e2e-aws-ovn-techpreview 4702f81 link true /test e2e-aws-ovn-techpreview

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants