Skip to content

OCPBUGS-78617: Fix PinnedImages test should respect node taints#30913

Draft
bshaw7 wants to merge 2 commits intoopenshift:mainfrom
bshaw7:fix/pinnedimages-respect-node-taints
Draft

OCPBUGS-78617: Fix PinnedImages test should respect node taints#30913
bshaw7 wants to merge 2 commits intoopenshift:mainfrom
bshaw7:fix/pinnedimages-respect-node-taints

Conversation

@bshaw7
Copy link

@bshaw7 bshaw7 commented Mar 20, 2026

The PinnedImages conformance test was selecting worker nodes by label only, ignoring node taints. This caused issues in environments with dedicated/tainted nodes (e.g., OPCT test infrastructure, edge zones).

Problem:

  • Test created custom MachineConfigPool targeting any worker node
  • Could select nodes with NoSchedule/NoExecute taints
  • In OPCT, selecting the dedicated node caused Sonobuoy pod eviction and test failure.

Solution:

  • Use e2enode.GetReadySchedulableNodes() to filter nodes
  • This function excludes nodes with NoSchedule/NoExecute taints
  • Follows the same pattern used by other OpenShift conformance tests

Changes:

  • Added import: e2enode "k8s.io/kubernetes/test/e2e/framework/node"
  • Modified addWorkerNodesToCustomPool() to:
    1. Get schedulable nodes using e2enode.GetReadySchedulableNodes()
    2. Filter for worker nodes only
    3. Select from schedulable workers (excluding tainted nodes)

Testing:

  • Verified node filtering logic correctly excludes tainted nodes
  • Tested on cluster with dedicated OPCT node (node-role.kubernetes.io/tests:NoSchedule)
  • Dedicated node correctly filtered out, test selects from 3 schedulable workers

Fixes: https://redhat.atlassian.net/browse/OCPBUGS-78617

@openshift-ci-robot
Copy link

Pipeline controller notification
This repo is configured to use the pipeline controller. Second-stage tests will be triggered either automatically or after lgtm label is added, depending on the repository configuration. The pipeline controller will automatically detect which contexts are required and will utilize /test Prow commands to trigger the second stage.

For optional jobs, comment /test ? to see a list of all defined jobs. To trigger manually all jobs from second stage use /pipeline required command.

This repository is configured in: automatic mode

@coderabbitai
Copy link

coderabbitai bot commented Mar 20, 2026

Important

Review skipped

Auto reviews are limited based on label configuration.

🚫 Excluded labels (none allowed) (1)
  • do-not-merge/work-in-progress

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: fa650b3e-5b3a-4df1-aac2-fbe4e89c1a6e

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review

Walkthrough

The addWorkerNodesToCustomPool function was refactored to use e2enode.GetReadySchedulableNodes() for node discovery instead of label selectors, filters results to retain only worker-role nodes, and updates error handling and node assignment logic accordingly.

Changes

Cohort / File(s) Summary
Node Discovery Refactoring
test/extended/machine_config/pinnedimages.go
Modified addWorkerNodesToCustomPool to discover nodes via GetReadySchedulableNodes() instead of label selector, added filtering for worker-role labels, updated node comparison logic, and revised error messaging to report schedulable worker-node counts.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@bshaw7 bshaw7 marked this pull request as draft March 20, 2026 13:52
@openshift-ci openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Mar 20, 2026
Comment on lines +363 to +370
// Filter for worker nodes only
var workerNodes []corev1.Node
for _, node := range nodes.Items {
if _, hasWorkerLabel := node.Labels["node-role.kubernetes.io/worker"]; hasWorkerLabel {
workerNodes = append(workerNodes, node)
}
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bshaw7 I think we could create a exported function in the package test/extended/node, maybe in node_utils.go, to wrap this logic with e2enode.GetReadySchedulableNodes so that other tests can reuse that logic (retrieve all scheduable worker nodes).

WDYT?

Copy link
Author

@bshaw7 bshaw7 Mar 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mtulio That's good idea to make it re-usable. So far i updated OCPBUGS-78617 with test result without exported function. Next I'll create exporter function and add test result of that too.

@bshaw7 bshaw7 changed the title Fix PinnedImages test should respect node taints OCPBUGS-78617: Fix PinnedImages test should respect node taints Mar 20, 2026
@openshift-ci-robot openshift-ci-robot added jira/severity-critical Referenced Jira bug's severity is critical for the branch this PR is targeting. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. labels Mar 20, 2026
@openshift-ci-robot
Copy link

@bshaw7: This pull request references Jira Issue OCPBUGS-78617, which is valid. The bug has been moved to the POST state.

3 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target version (4.22.0) matches configured target version for branch (4.22.0)
  • bug is in the state New, which is one of the valid states (NEW, ASSIGNED, POST)

The bug has been updated to refer to the pull request using the external bug tracker.

Details

In response to this:

The PinnedImages conformance test was selecting worker nodes by label only, ignoring node taints. This caused issues in environments with dedicated/tainted nodes (e.g., OPCT test infrastructure, edge zones).

Problem:

  • Test created custom MachineConfigPool targeting any worker node
  • Could select nodes with NoSchedule/NoExecute taints
  • In OPCT, selecting the dedicated node caused Sonobuoy pod eviction and test failure.

Solution:

  • Use e2enode.GetReadySchedulableNodes() to filter nodes
  • This function excludes nodes with NoSchedule/NoExecute taints
  • Follows the same pattern used by other OpenShift conformance tests

Changes:

  • Added import: e2enode "k8s.io/kubernetes/test/e2e/framework/node"
  • Modified addWorkerNodesToCustomPool() to:
  1. Get schedulable nodes using e2enode.GetReadySchedulableNodes()
  2. Filter for worker nodes only
  3. Select from schedulable workers (excluding tainted nodes)

Testing:

  • Verified node filtering logic correctly excludes tainted nodes
  • Tested on cluster with dedicated OPCT node (node-role.kubernetes.io/tests:NoSchedule)
  • Dedicated node correctly filtered out, test selects from 3 schedulable workers

Fixes: https://redhat.atlassian.net/browse/OCPBUGS-78617

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@mtulio
Copy link
Contributor

mtulio commented Mar 20, 2026

/payload-job periodic-ci-openshift-release-main-nightly-4.19-opct-external-aws-ccm

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Mar 20, 2026

@mtulio: trigger 1 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command

  • periodic-ci-openshift-release-main-nightly-4.19-opct-external-aws-ccm

See details on https://pr-payload-tests.ci.openshift.org/runs/ci/cb1fbc90-247e-11f1-8e80-2eb60f5762cd-0

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Mar 24, 2026

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: bshaw7
Once this PR has been reviewed and has the lgtm label, please assign cheesesashimi, cpmeadors for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@mtulio
Copy link
Contributor

mtulio commented Mar 24, 2026

/payload-job periodic-ci-openshift-release-main-nightly-4.19-opct-external-aws-ccm

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Mar 24, 2026

@mtulio: trigger 1 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command

  • periodic-ci-openshift-release-main-nightly-4.19-opct-external-aws-ccm

See details on https://pr-payload-tests.ci.openshift.org/runs/ci/30cc6790-2782-11f1-810c-f7710a795c2b-0

@openshift-ci openshift-ci bot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Mar 24, 2026
bshaw7 and others added 2 commits March 24, 2026 18:41
The PinnedImages conformance test was selecting worker nodes by label
only, ignoring node taints. This caused issues in environments with
dedicated/tainted nodes (e.g., OPCT test infrastructure, edge zones).

Problem:
- Test created custom MachineConfigPool targeting any worker node
- Could select nodes with NoSchedule/NoExecute taints
- In OPCT, selecting the dedicated node caused pod eviction and test failure

Solution:
- Use e2enode.GetReadySchedulableNodes() to filter nodes
- This function excludes nodes with NoSchedule/NoExecute taints
- Follows the same pattern used by other OpenShift conformance tests

Changes:
- Added import: e2enode "k8s.io/kubernetes/test/e2e/framework/node"
- Modified addWorkerNodesToCustomPool() to:
  1. Get schedulable nodes using e2enode.GetReadySchedulableNodes()
  2. Filter for worker nodes only
  3. Select from schedulable workers (excluding tainted nodes)

Testing:
- Verified node filtering logic correctly excludes tainted nodes
- Tested on cluster with dedicated OPCT node (node-role.kubernetes.io/tests:NoSchedule)
- Dedicated node correctly filtered out, test selects from 3 schedulable workers

Fixes: https://github.com/redhat-openshift-ecosystem/opct/issues/[TBD]

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
…function

As suggested in code review, created a shared utility function
GetReadySchedulableWorkerNodes() in test/extended/node/node_utils.go
to encapsulate the logic of retrieving schedulable worker nodes
(excluding nodes with NoSchedule/NoExecute taints).

This makes the node selection logic reusable across multiple tests
and improves code maintainability.

Changes:
- Added GetReadySchedulableWorkerNodes() in test/extended/node/node_utils.go
- Refactored PinnedImages test to use the new utility function
- Simplified addWorkerNodesToCustomPool() by removing manual filtering logic
@bshaw7 bshaw7 force-pushed the fix/pinnedimages-respect-node-taints branch from af1538d to 9b9fb62 Compare March 24, 2026 13:17
@openshift-ci openshift-ci bot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Mar 24, 2026
Copy link
Contributor

@mtulio mtulio left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looks good to me. We need to make sure the payload job will run as expected, then trigger the required jobs for the pipeline (/pipeline required)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. jira/severity-critical Referenced Jira bug's severity is critical for the branch this PR is targeting. jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants