OTA-1892: pkg/cvo/metrics: Serve cluster_version_available_updates when channel is set#1351
OTA-1892: pkg/cvo/metrics: Serve cluster_version_available_updates when channel is set#1351wking wants to merge 1 commit intoopenshift:mainfrom
Conversation
|
@wking: This pull request references OTA-1892 which is a valid jira issue. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
WalkthroughGates emission of available-updates metrics on whether Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes ✨ Finishing Touches🧪 Generate unit tests (beta)
📝 Coding Plan
Comment Tip CodeRabbit can generate a title for your PR based on the changes with custom instructions.Set the |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: wking The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
… is set Since 8b91189 (cvo: Add prometheus metrics to the CVO for current update state, 2018-11-05, openshift#45), cluster_version_available_updates is only served when there are unconditionally-recommended updates and retrieval is succeeding. This makes it hard to understand at the fleet level when: cluster_operator_conditions{name="version", condition="RetrievedUpdates", reason="VersionNotFound"} == 0 is because of a misconfigured channel, or a misbehaving Update Service, or otherwise. We should always export cluster_version_available_updates whenever a channel is set, to make it easier to isolate the “because the cluster-admin has somehow selected a channel not compatible with their current version” (which we can’t do much about other than keep serving our existing CannotRetrieveUpdates alert) from the other possibilities (which we might be able to do something about).
688bf45 to
4702f81
Compare
There was a problem hiding this comment.
🧹 Nitpick comments (1)
pkg/cvo/metrics_test.go (1)
359-427: Add one negative regression case for empty channel.Please add a test where
Status.AvailableUpdatesis non-empty butSpec.Channelis empty, and assertcluster_version_available_updatesis not emitted. That will lock in the new gate behavior from both sides.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pkg/cvo/metrics_test.go` around lines 359 - 427, Add a new table-driven test case in pkg/cvo/metrics_test.go similar to the "collects available updates" case but set the ClusterVersion.Spec.Channel to "" while Status.AvailableUpdates contains entries; use the same Operator construct (optr with cvLister and a ClusterVersion object) and in the wants closure assert that the prometheus metric "cluster_version_available_updates" (the one previously checked via expectMetric with labels {"upstream": "<default>", "channel": "test-channel"}) is not emitted — i.e., scan the metrics slice and fail if any metric has labels "upstream" and "channel" populated (or specifically ensure no metric exists with channel == ""), keeping the rest of expected metrics the same as the non-empty-updates path; reference Operator, cvLister, ClusterVersion.Spec.Channel, and ClusterVersion.Status.AvailableUpdates to locate where to change/add the test case.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@pkg/cvo/metrics_test.go`:
- Around line 359-427: Add a new table-driven test case in
pkg/cvo/metrics_test.go similar to the "collects available updates" case but set
the ClusterVersion.Spec.Channel to "" while Status.AvailableUpdates contains
entries; use the same Operator construct (optr with cvLister and a
ClusterVersion object) and in the wants closure assert that the prometheus
metric "cluster_version_available_updates" (the one previously checked via
expectMetric with labels {"upstream": "<default>", "channel": "test-channel"})
is not emitted — i.e., scan the metrics slice and fail if any metric has labels
"upstream" and "channel" populated (or specifically ensure no metric exists with
channel == ""), keeping the rest of expected metrics the same as the
non-empty-updates path; reference Operator, cvLister,
ClusterVersion.Spec.Channel, and ClusterVersion.Status.AvailableUpdates to
locate where to change/add the test case.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 74526213-430d-49d7-831d-3d518d273c44
📒 Files selected for processing (2)
pkg/cvo/metrics.gopkg/cvo/metrics_test.go
|
@wking: The following tests failed, say
Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
Since 8b91189 (#45),
cluster_version_available_updatesis only served when there are unconditionally-recommended updates and retrieval is succeeding. This makes it hard to understand at the fleet level when:is because of a misconfigured
channel, or a misbehaving Update Service, or otherwise. We should always exportcluster_version_available_updateswhenever achannelis set, to make it easier to isolate the “because the cluster-admin has somehow selected a channel not compatible with their current version” (which we can’t do much about other than keep serving our existingCannotRetrieveUpdatesalert) from the other possibilities (which we might be able to do something about).