Skip to content

Add upgrade workflow for kueue operator testing#74857

Open
sohankunkerkar wants to merge 1 commit intoopenshift:mainfrom
sohankunkerkar:kueue-upgrade-job-test
Open

Add upgrade workflow for kueue operator testing#74857
sohankunkerkar wants to merge 1 commit intoopenshift:mainfrom
sohankunkerkar:kueue-upgrade-job-test

Conversation

@sohankunkerkar
Copy link
Member

No description provided.

Copilot AI review requested due to automatic review settings February 13, 2026 04:22
@openshift-merge-robot openshift-merge-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Feb 13, 2026
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Feb 13, 2026

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: sohankunkerkar

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Feb 13, 2026
@openshift-ci-robot openshift-ci-robot added the rehearsals-ack Signifies that rehearsal jobs have been acknowledged label Feb 13, 2026
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a new CI upgrade test workflow for the kueue-operator that installs the operator after an OCP upgrade and runs a small workload-based smoke test, and wires that workflow into presubmit/periodic jobs for multiple upgrade paths.

Changes:

  • Added a new step-registry chain for upgrade testing, plus install + workload smoke-test refs.
  • Added ci-operator config variants for upgrades 4.18→4.19, 4.19→4.20, and 4.20→4.21 (each with two component flavors).
  • Added new presubmit and periodic Prow jobs to execute the upgrade suites and variant images jobs.

Reviewed changes

Copilot reviewed 13 out of 13 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
ci-operator/step-registry/kueue-operator/test/upgrade/workload/kueue-operator-test-upgrade-workload-ref.yaml New step ref definition for the post-upgrade workload smoke test.
ci-operator/step-registry/kueue-operator/test/upgrade/workload/kueue-operator-test-upgrade-workload-ref.metadata.json Metadata for the workload smoke-test step.
ci-operator/step-registry/kueue-operator/test/upgrade/workload/kueue-operator-test-upgrade-workload-commands.sh Implements the workload smoke test (creates queues/flavor + submits Job + checks admission/completion).
ci-operator/step-registry/kueue-operator/test/upgrade/kueue-operator-test-upgrade-chain.yaml New upgrade test chain combining env setup, cert-manager, operator install, and workload smoke test.
ci-operator/step-registry/kueue-operator/test/upgrade/kueue-operator-test-upgrade-chain.metadata.json Metadata for the upgrade chain.
ci-operator/step-registry/kueue-operator/test/upgrade/install/kueue-operator-test-upgrade-install-ref.yaml New step ref to install the operator bundle via operator-sdk.
ci-operator/step-registry/kueue-operator/test/upgrade/install/kueue-operator-test-upgrade-install-ref.metadata.json Metadata for the install step.
ci-operator/step-registry/kueue-operator/test/upgrade/install/kueue-operator-test-upgrade-install-commands.sh Implements the bundle install using operator-sdk run bundle.
ci-operator/jobs/openshift/kueue-operator/openshift-kueue-operator-main-presubmits.yaml Adds presubmit upgrade and images jobs for the new variants/targets.
ci-operator/jobs/openshift/kueue-operator/openshift-kueue-operator-main-periodics.yaml Adds periodic upgrade jobs for the new variants/targets.
ci-operator/config/openshift/kueue-operator/openshift-kueue-operator-main__upgrade-from-4.18.yaml New ci-operator variant defining 4.18→4.19 upgrade tests (kueue 1.1/1.2).
ci-operator/config/openshift/kueue-operator/openshift-kueue-operator-main__upgrade-from-4.19.yaml New ci-operator variant defining 4.19→4.20 upgrade tests (kueue 1.1/1.2).
ci-operator/config/openshift/kueue-operator/openshift-kueue-operator-main__upgrade-from-4.20.yaml New ci-operator variant defining 4.20→4.21 upgrade tests (kueue 1.1/1.2).

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +91 to +97
echo "Verifying workload finished status..."
FINISHED=$(oc get workloads -n kueue-upgrade-test -o jsonpath='{.items[0].status.conditions[?(@.type=="Finished")].status}' 2>/dev/null || true)
if [ "$FINISHED" = "True" ]; then
echo "Kueue workload completed and finished successfully on upgraded cluster!"
else
echo "WARNING: Workload Finished condition not set, but job completed."
oc get workloads -n kueue-upgrade-test -o yaml
Copy link

Copilot AI Feb 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Finished condition check also uses .items[0], which can end up checking a different Workload than the one admitted/created for this Job. Reuse the same resolved Workload name from the admission phase to verify Finished on the correct object.

Copilot uses AI. Check for mistakes.
Comment on lines +88 to +90
echo "Waiting for Job to complete..."
oc wait --for=condition=complete job/kueue-smoke-test-job -n kueue-upgrade-test --timeout=300s

Copy link

Copilot AI Feb 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oc wait ... --timeout=300s may be too short for an upgraded cluster where image pulls and scheduling can be slower, leading to intermittent failures even when the system is healthy. Consider increasing the timeout (and/or making it configurable) to reduce flakiness.

Copilot uses AI. Check for mistakes.
{
"path": "ci-operator/step-registry/kueue-operator/test/upgrade",
"owners": "openshift/kueue-operator",
"description": "Chain that installs kueue operator with dependencies on an upgraded cluster, runs e2e tests and a workload smoke test."
Copy link

Copilot AI Feb 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The metadata description says this chain "runs e2e tests", but the chain YAML only installs dependencies/operator and runs the workload smoke test. Please update the description to match what the chain actually does (or move the e2e reference to the CI config/workflow description where openshift-e2e-test is invoked).

Suggested change
"description": "Chain that installs kueue operator with dependencies on an upgraded cluster, runs e2e tests and a workload smoke test."
"description": "Chain that installs kueue operator with dependencies on an upgraded cluster and runs a workload smoke test."

Copilot uses AI. Check for mistakes.
Comment on lines +71 to +75
echo "Waiting for workload to be admitted by kueue..."
for i in $(seq 1 30); do
ADMITTED=$(oc get workloads -n kueue-upgrade-test -o jsonpath='{.items[0].status.conditions[?(@.type=="Admitted")].status}' 2>/dev/null || true)
if [ "$ADMITTED" = "True" ]; then
echo "Workload admitted by kueue successfully!"
Copy link

Copilot AI Feb 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The admission check reads only the first Workload in the namespace (.items[0]). If the workload list is empty initially or multiple Workloads exist, this can cause false negatives/positives and flaky behavior. Consider first determining the specific Workload created for this Job (e.g., wait until exactly one exists and capture its name, or select by a label/owner reference), then query conditions on that named Workload.

Copilot uses AI. Check for mistakes.
@openshift-merge-robot openshift-merge-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Feb 13, 2026
@openshift-ci-robot openshift-ci-robot removed the rehearsals-ack Signifies that rehearsal jobs have been acknowledged label Feb 13, 2026
@sohankunkerkar
Copy link
Member Author

/pj-rehearse

@openshift-ci-robot
Copy link
Contributor

@sohankunkerkar: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel.

@sohankunkerkar sohankunkerkar force-pushed the kueue-upgrade-job-test branch 2 times, most recently from 2b03054 to ce5f097 Compare February 13, 2026 05:18
@sohankunkerkar
Copy link
Member Author

/pj-rehearse

@openshift-ci-robot
Copy link
Contributor

@sohankunkerkar: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel.

@sohankunkerkar
Copy link
Member Author

/pj-rehearse

@openshift-ci-robot
Copy link
Contributor

@sohankunkerkar: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel.

Signed-off-by: Sohan Kunkerkar <sohank2602@gmail.com>
@sohankunkerkar
Copy link
Member Author

/pj-rehearse

@openshift-ci-robot
Copy link
Contributor

@sohankunkerkar: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel.

@openshift-ci-robot
Copy link
Contributor

[REHEARSALNOTIFIER]
@sohankunkerkar: the pj-rehearse plugin accommodates running rehearsal tests for the changes in this PR. Expand 'Interacting with pj-rehearse' for usage details. The following rehearsable tests have been affected by this change:

Test name Repo Type Reason
pull-ci-openshift-kueue-operator-main-upgrade-from-4.18-e2e-upgrade-4-18-to-4-19-kueue-1-1 openshift/kueue-operator presubmit Presubmit changed
pull-ci-openshift-kueue-operator-main-upgrade-from-4.18-e2e-upgrade-4-18-to-4-19-kueue-1-2 openshift/kueue-operator presubmit Presubmit changed
pull-ci-openshift-kueue-operator-main-upgrade-from-4.18-images openshift/kueue-operator presubmit Presubmit changed
pull-ci-openshift-kueue-operator-main-upgrade-from-4.19-e2e-upgrade-4-19-to-4-20-kueue-1-1 openshift/kueue-operator presubmit Presubmit changed
pull-ci-openshift-kueue-operator-main-upgrade-from-4.19-e2e-upgrade-4-19-to-4-20-kueue-1-2 openshift/kueue-operator presubmit Presubmit changed
pull-ci-openshift-kueue-operator-main-upgrade-from-4.19-images openshift/kueue-operator presubmit Presubmit changed
pull-ci-openshift-kueue-operator-main-upgrade-from-4.20-e2e-upgrade-4-20-to-4-21-kueue-1-1 openshift/kueue-operator presubmit Presubmit changed
pull-ci-openshift-kueue-operator-main-upgrade-from-4.20-e2e-upgrade-4-20-to-4-21-kueue-1-2 openshift/kueue-operator presubmit Presubmit changed
pull-ci-openshift-kueue-operator-main-upgrade-from-4.20-images openshift/kueue-operator presubmit Presubmit changed
Interacting with pj-rehearse

Comment: /pj-rehearse to run up to 5 rehearsals
Comment: /pj-rehearse skip to opt-out of rehearsals
Comment: /pj-rehearse {test-name}, with each test separated by a space, to run one or more specific rehearsals
Comment: /pj-rehearse more to run up to 10 rehearsals
Comment: /pj-rehearse max to run up to 25 rehearsals
Comment: /pj-rehearse auto-ack to run up to 5 rehearsals, and add the rehearsals-ack label on success
Comment: /pj-rehearse list to get an up-to-date list of affected jobs
Comment: /pj-rehearse abort to abort all active rehearsals
Comment: /pj-rehearse network-access-allowed to allow rehearsals of tests that have the restrict_network_access field set to false. This must be executed by an openshift org member who is not the PR author

Once you are satisfied with the results of the rehearsals, comment: /pj-rehearse ack to unblock merge. When the rehearsals-ack label is present on your PR, merge will no longer be blocked by rehearsals.
If you would like the rehearsals-ack label removed, comment: /pj-rehearse reject to re-block merging.

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Feb 13, 2026

@sohankunkerkar: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/rehearse/openshift/kueue-operator/main/upgrade-from-4.19-e2e-upgrade-4-19-to-4-20-kueue-1-2 8584b38 link unknown /pj-rehearse pull-ci-openshift-kueue-operator-main-upgrade-from-4.19-e2e-upgrade-4-19-to-4-20-kueue-1-2
ci/rehearse/openshift/kueue-operator/main/upgrade-from-4.19-e2e-upgrade-4-19-to-4-20-kueue-1-1 8584b38 link unknown /pj-rehearse pull-ci-openshift-kueue-operator-main-upgrade-from-4.19-e2e-upgrade-4-19-to-4-20-kueue-1-1
ci/rehearse/openshift/kueue-operator/main/upgrade-from-4.18-e2e-upgrade-4-18-to-4-19-kueue-1-2 8584b38 link unknown /pj-rehearse pull-ci-openshift-kueue-operator-main-upgrade-from-4.18-e2e-upgrade-4-18-to-4-19-kueue-1-2
ci/rehearse/openshift/kueue-operator/main/upgrade-from-4.18-e2e-upgrade-4-18-to-4-19-kueue-1-1 8584b38 link unknown /pj-rehearse pull-ci-openshift-kueue-operator-main-upgrade-from-4.18-e2e-upgrade-4-18-to-4-19-kueue-1-1

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants