NO-JIRA: Harden MCN condition transition tests and adjust tests to run on SNO#5833
NO-JIRA: Harden MCN condition transition tests and adjust tests to run on SNO#5833isabella-janssen wants to merge 1 commit intoopenshift:mainfrom
Conversation
|
@isabella-janssen: This pull request explicitly references no jira issue. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
WalkthroughRemoved a logging side-effect in Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: isabella-janssen The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
/payload-aggregate periodic-ci-openshift-machine-config-operator-release-4.22-periodics-e2e-aws-mco-disruptive-techpreview-1of3 10 |
|
@isabella-janssen: trigger 1 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command
See details on https://pr-payload-tests.ci.openshift.org/runs/ci/450ee830-2dc8-11f1-9a80-df20096ca503-0 |
|
/payload-aggregate periodic-ci-openshift-machine-config-operator-release-4.22-periodics-e2e-aws-mco-disruptive-techpreview-2of3 10 |
|
@isabella-janssen: trigger 1 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command
See details on https://pr-payload-tests.ci.openshift.org/runs/ci/4d041ba0-2dc8-11f1-813c-9c7e53a7ae59-0 |
|
Caution Failed to replace (edit) comment. This is likely due to insufficient permissions or the comment being deleted. Error details |
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@test/extended/machineconfignode.go`:
- Around line 182-196: The branch in the MachineConfigNode check has inverted
logic and a malformed error format: compute isUpdated by checking
mcfgv1.MachineConfigNodeUpdated == metav1.ConditionTrue (use
checkMCNConditionStatus(workerNodeMCN, mcfgv1.MachineConfigNodeUpdated,
metav1.ConditionTrue)), then swap the two branches so that if conditionIsFalse
&& isUpdated you log/return success (logger.Infof(...)) and if conditionIsFalse
&& !isUpdated you return an error; also fix the fmt.Errorf call to include the
missing format arguments (e.g., mcnName, conditionType, metav1.ConditionTrue) so
the error message matches the placeholders and logs correct values for
workerNodeMCN, checkMCNConditionStatus, mcfgv1.MachineConfigNodeUpdated,
logger.Infof and fmt.Errorf.
- Around line 335-345: The assertion failure messages are copy-pasted and
reference the wrong condition names; update the o.Expect(...).To(o.BeTrue(),
"...") messages to match the actual condition you're polling via
waitForMCNConditionStatus for the given call site: for the call that checks the
Resumed condition (using waitForMCNConditionStatus with the Resumed constant),
change the message to "Resumed=True"; for the call that checks
mcfgv1.MachineConfigNodeUpdateComplete, change the message to
"UpdateComplete=True"; and for the call that checks
mcfgv1.MachineConfigNodeUpdateUncordoned, change the message to
"Uncordoned=True" (referencing machineConfigClient, updatingNodeName and the
waitForMCNConditionStatus invocations to locate each assertion).
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 0517b08b-0f9e-4a49-aea3-ae81952ed0cc
📒 Files selected for processing (1)
test/extended/machineconfignode.go
ea4aa71 to
b9b4cb1
Compare
There was a problem hiding this comment.
♻️ Duplicate comments (1)
test/extended/machineconfignode.go (1)
182-195:⚠️ Potential issue | 🔴 CriticalOnly treat
condition=Falseas a missed transition afterUpdated=True.Line 188 checks
Updated=False, but the completed state described in Lines 175-181 isUpdated=True. Also,condition=FalsewhileUpdatedis still notTrueis the normal in-progress state, so Line 192 should keep polling instead of terminating. Finally, the early-success path at Line 189 needs to setconditionMet = true; otherwise this helper still returnsfalseto the callers below.Suggested fix
if status == metav1.ConditionTrue && (conditionType == mcfgv1.MachineConfigNodeUpdateRebooted || conditionType == mcfgv1.MachineConfigNodeResumed || conditionType == mcfgv1.MachineConfigNodeUpdateComplete || conditionType == mcfgv1.MachineConfigNodeUpdateUncordoned) { conditionIsFalse := checkMCNConditionStatus(workerNodeMCN, conditionType, metav1.ConditionFalse) - isUpdated := checkMCNConditionStatus(workerNodeMCN, mcfgv1.MachineConfigNodeUpdated, metav1.ConditionFalse) + isUpdated := checkMCNConditionStatus(workerNodeMCN, mcfgv1.MachineConfigNodeUpdated, metav1.ConditionTrue) if conditionIsFalse && isUpdated { + conditionMet = true logger.Infof("MCN '%v' %v condition was %v, missed transition through %v. Update completed before transition was caught.", mcnName, conditionType, metav1.ConditionTrue, status) return true, nil - } else if conditionIsFalse && !isUpdated { - logger.Infof("MCN '%v' %v condition was %v, missed transition through %v and update is incomplete.", mcnName, conditionType, metav1.ConditionTrue, status) - return true, fmt.Errorf("MCN '%v' %v condition was %v, but the update is not complete.") } }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@test/extended/machineconfignode.go` around lines 182 - 195, The helper currently treats a transition as "missed" when the condition becomes False while it checks Updated==False and even returns an error; fix it so that you only consider a missed transition when the condition becomes False after Updated==True: change the isUpdated check to call checkMCNConditionStatus(workerNodeMCN, mcfgv1.MachineConfigNodeUpdated, metav1.ConditionTrue), and when conditionIsFalse && isUpdated set conditionMet = true before returning (or return true,nil) so callers see success; remove/replace the branch that returns an error on conditionIsFalse && !isUpdated so it does not terminate polling (i.e., keep polling/continue) because condition=False while Updated!=True is a normal in-progress state.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Duplicate comments:
In `@test/extended/machineconfignode.go`:
- Around line 182-195: The helper currently treats a transition as "missed" when
the condition becomes False while it checks Updated==False and even returns an
error; fix it so that you only consider a missed transition when the condition
becomes False after Updated==True: change the isUpdated check to call
checkMCNConditionStatus(workerNodeMCN, mcfgv1.MachineConfigNodeUpdated,
metav1.ConditionTrue), and when conditionIsFalse && isUpdated set conditionMet =
true before returning (or return true,nil) so callers see success;
remove/replace the branch that returns an error on conditionIsFalse &&
!isUpdated so it does not terminate polling (i.e., keep polling/continue)
because condition=False while Updated!=True is a normal in-progress state.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: f9ad6e70-4a5e-41f0-8f28-0e509022f240
📒 Files selected for processing (1)
test/extended/machineconfignode.go
d2b5a47 to
85eccb8
Compare
|
/payload-aggregate periodic-ci-openshift-machine-config-operator-release-4.22-periodics-e2e-aws-mco-disruptive-techpreview-2of3 10 |
|
@isabella-janssen: trigger 1 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command
See details on https://pr-payload-tests.ci.openshift.org/runs/ci/4ff80210-2dda-11f1-8a20-214e9058b4ba-0 |
|
/payload-aggregate periodic-ci-openshift-machine-config-operator-release-4.22-periodics-e2e-aws-mco-disruptive-techpreview-1of3 10 |
|
@isabella-janssen: trigger 1 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command
See details on https://pr-payload-tests.ci.openshift.org/runs/ci/92745ce0-2deb-11f1-9e2b-b959deb1da55-0 |
|
/test unit |
85eccb8 to
8c4e14e
Compare
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
test/extended/machineconfignode.go (1)
181-204:⚠️ Potential issue | 🟠 MajorPropagate callback errors instead of silently returning
nilwhenGet()succeeds.Line 40 returns an explicit error for incomplete updates, but line 55 discards it when
conditionErr == nil. The error is logged but not returned, breaking error propagation for this specific failure case and reducing test diagnostics.The fix is to ensure the callback's error reaches the caller:
Suggested fix
} else if conditionIsFalse && !isUpdated { logger.Infof("MCN '%v' %v condition was %v, missed transition through %v and update is incomplete.", mcnName, conditionType, metav1.ConditionTrue, status) - return false, fmt.Errorf("mcn '%v' %v condition was %v, but the update is not complete", mcnName, conditionType, metav1.ConditionFalse) + conditionErr = fmt.Errorf("mcn '%v' %v condition was %v, but the update is not complete", mcnName, conditionType, metav1.ConditionFalse) + return false, conditionErr } }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@test/extended/machineconfignode.go` around lines 181 - 204, The callback sometimes returns an explicit error (from the branch that checks conditionIsFalse && !isUpdated) but that error is not propagated back to the caller because when the wait loop returns err, the code checks conditionErr == nil and only logs the callback error instead of returning it; update the error handling so when the wait loop returns an error and conditionErr is non-nil you return that callback error (e.g., propagate conditionErr) rather than returning nil—look for symbols checkMCNConditionStatus, workerNodeMCN, mcnName, conditionType, status, conditionErr, conditionMet and change the block that currently logs "An error occurred waiting for MCN..." to return the callback error so the incomplete-update error bubbles up.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Outside diff comments:
In `@test/extended/machineconfignode.go`:
- Around line 181-204: The callback sometimes returns an explicit error (from
the branch that checks conditionIsFalse && !isUpdated) but that error is not
propagated back to the caller because when the wait loop returns err, the code
checks conditionErr == nil and only logs the callback error instead of returning
it; update the error handling so when the wait loop returns an error and
conditionErr is non-nil you return that callback error (e.g., propagate
conditionErr) rather than returning nil—look for symbols
checkMCNConditionStatus, workerNodeMCN, mcnName, conditionType, status,
conditionErr, conditionMet and change the block that currently logs "An error
occurred waiting for MCN..." to return the callback error so the
incomplete-update error bubbles up.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: b33f8c41-ca66-4d91-9857-f03f1d90355c
📒 Files selected for processing (1)
test/extended/machineconfignode.go
8c4e14e to
0b0269d
Compare
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@test/extended/machineconfignode.go`:
- Around line 177-182: In waitForMCNConditionStatus, when detecting the missed
transition (inside the block checking status == metav1.ConditionTrue &&
conditionType == mcfgv1.MachineConfigNodeUpdateRebooted), set the outer variable
conditionMet = true before returning so the short-circuit success propagates;
i.e., when conditionIsFalse && isUpdated is true, assign conditionMet = true,
log as-is, and then return conditionMet, nil (or simply return true, nil if
using the same local variable) so callers (e.g., the code that asserts
RebootedNode=True) receive the correct success value; use the existing helpers
checkMCNConditionStatus and the local names workerNodeMCN, mcnName,
conditionType to locate the block.
- Around line 183-185: The current branch treats a false condition plus
isUpdated==false as a terminal failure by returning an error; change this so we
do not treat "RebootedNode" as a terminal miss when Updated is still false.
Update the block that checks conditionIsFalse && !isUpdated (the code using
symbols conditionIsFalse, isUpdated, logger.Infof, mcnName, conditionType,
metav1.ConditionTrue, status) to only return an error for other condition types,
and for the "RebootedNode" condition log the info and return (false, nil) so the
waiter keeps polling; keep existing error behavior for non-RebootedNode
conditions.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: c0d0602a-544e-443a-9834-5e314bf9e012
📒 Files selected for processing (1)
test/extended/machineconfignode.go
0b0269d to
d282236
Compare
|
@isabella-janssen: trigger 1 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command
See details on https://pr-payload-tests.ci.openshift.org/runs/ci/24c944e0-2e44-11f1-91c0-3c260a913236-0 |
|
/payload-aggregate periodic-ci-openshift-machine-config-operator-release-4.22-periodics-e2e-azure-mco-disruptive-techpreview-1of3 10 |
|
@isabella-janssen: trigger 1 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command
See details on https://pr-payload-tests.ci.openshift.org/runs/ci/73b1b310-2e46-11f1-971f-d7fbf2934c91-0 |
|
/payload-aggregate periodic-ci-openshift-machine-config-operator-release-4.22-periodics-e2e-azure-mco-disruptive-techpreview-2of3 10 |
|
@isabella-janssen: trigger 1 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command
See details on https://pr-payload-tests.ci.openshift.org/runs/ci/7641acc0-2e46-11f1-8f4f-e780b3d7d95e-0 |
bbe63ed to
ba4f95b
Compare
ba4f95b to
1a33003
Compare
|
/payload-aggregate periodic-ci-openshift-machine-config-operator-release-4.22-periodics-e2e-gcp-mco-disruptive-techpreview-1of3 10 |
|
/payload-aggregate periodic-ci-openshift-machine-config-operator-release-4.22-periodics-e2e-gcp-mco-disruptive-techpreview-2of3 10 |
|
@isabella-janssen: trigger 1 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command
See details on https://pr-payload-tests.ci.openshift.org/runs/ci/8b3c3e00-2e92-11f1-8b9f-e67cd8130828-0 |
|
/payload-aggregate periodic-ci-openshift-machine-config-operator-release-4.22-periodics-e2e-aws-mco-disruptive-techpreview-1of3 10 |
|
@isabella-janssen: trigger 1 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command
See details on https://pr-payload-tests.ci.openshift.org/runs/ci/8db6b3e0-2e92-11f1-8356-e52cbadbb1c5-0 |
|
/payload-aggregate periodic-ci-openshift-machine-config-operator-release-4.22-periodics-e2e-aws-mco-disruptive-techpreview-2of3 10 |
|
@isabella-janssen: trigger 1 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command
See details on https://pr-payload-tests.ci.openshift.org/runs/ci/8fef3ec0-2e92-11f1-8a57-e57d7abd1cd0-0 |
|
/payload-aggregate periodic-ci-openshift-machine-config-operator-release-4.22-periodics-e2e-azure-mco-disruptive-techpreview-1of3 10 |
|
@isabella-janssen: trigger 1 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command
See details on https://pr-payload-tests.ci.openshift.org/runs/ci/9252d140-2e92-11f1-958c-56a966bb0c4f-0 |
|
/payload-aggregate periodic-ci-openshift-machine-config-operator-release-4.22-periodics-e2e-azure-mco-disruptive-techpreview-2of3 10 |
|
@isabella-janssen: trigger 1 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command
See details on https://pr-payload-tests.ci.openshift.org/runs/ci/9437bde0-2e92-11f1-8ab4-e73c6e3cd52e-0 |
|
@isabella-janssen: trigger 1 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command
See details on https://pr-payload-tests.ci.openshift.org/runs/ci/962cb010-2e92-11f1-8a9e-74b394aa603f-0 |
1a33003 to
07f3e12
Compare
|
/payload-aggregate periodic-ci-openshift-machine-config-operator-release-4.22-periodics-e2e-aws-mco-disruptive-techpreview-1of3 10 |
|
/payload-aggregate periodic-ci-openshift-machine-config-operator-release-4.22-periodics-e2e-aws-mco-disruptive-techpreview-2of3 10 |
|
@isabella-janssen: trigger 1 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command
See details on https://pr-payload-tests.ci.openshift.org/runs/ci/5d372ff0-2ea7-11f1-8289-cef98da6ca32-0 |
|
@isabella-janssen: trigger 1 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command
See details on https://pr-payload-tests.ci.openshift.org/runs/ci/5f7b7aa0-2ea7-11f1-8c78-c23ee798c533-0 |
07f3e12 to
054aac3
Compare
…ransition to 'Updated' & to work for SNO
054aac3 to
f51daf5
Compare
- What I did
Part 1 - Test hardening:
This hardens the MCN condition transition tests for ImageModeStatusReporting. On rare occasions, such as this example payload run, we would see the test fail because we never detected the
RebootedNodecondition in a state ofTrue. The reboot condition transitions from False --> Unknown --> True --> False, and some stages of the transition can occur quickly. If the reboot were to fail, the condition would stay inUnknown, as that is the error condition status, so warning instead of erroring when we miss theTruecondition will not hide true functional errors.Part 2 - Modifications for SNO:
This updates the three ImageModeStatusReporting tests that were designed to run in custom MCPs to run against the master MCP when the worker pool has no nodes. This allows the tests to run in the SNO topology with openshift/release#77273.
- How to verify it
Part 1 - Test hardening:
I have not been able to reproduce the error and the flake is quite rare, so the best way to verify the fix is likely to run lots of jobs and make sure the flake never appears.
Part 2 - Modifications for SNO:
These tests can be verified locally by running them against a SNO cluster or by rehearsing them with openshift/release#77273 once this work merges.
- Description for the changelog
NO-JIRA: Harden MCN condition transition tests and adjust tests to run on SNO