(WIP) OCPBUGS-65645#6010
Conversation
|
Pipeline controller notification For optional jobs, comment This repository is configured in: LGTM mode |
|
Skipping CI for Draft Pull Request. |
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
WalkthroughWhen the daemon finds the node already in the desired config (inDesiredConfig/resumed path), it now invokes CoreOSDaemon.verifyExtensionPackages(state.currentConfig); verification failures cause updateConfigAndState to return an error and abort the resumed completion flow. ChangesExtension Package Verification
Sequence DiagramsequenceDiagram
participant UpdateFlow as Update Flow
participant CoreOS as CoreOSDaemon.verifyExtensionPackages
participant Config as MachineConfig
participant RPM as RPM DB (chrooted)
UpdateFlow->>UpdateFlow: Enter inDesiredConfig / resumed path
UpdateFlow->>CoreOS: verifyExtensionPackages(currentConfig)
CoreOS->>Config: Read Spec.Extensions
loop for each expected package
CoreOS->>RPM: rpm -q <pkg>
RPM-->>CoreOS: exit code / output
end
alt All packages present
CoreOS-->>UpdateFlow: success
UpdateFlow->>UpdateFlow: continue resumed completion
else Missing packages (exit code 1)
CoreOS-->>UpdateFlow: error (aggregated missing pkgs)
UpdateFlow->>UpdateFlow: abort completion (return error)
end
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes 🚥 Pre-merge checks | ✅ 11 | ❌ 1❌ Failed checks (1 inconclusive)
✅ Passed checks (11 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: isabella-janssen The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@pkg/daemon/daemon.go`:
- Line 2374: The nil comparison on dn.os is invalid because dn.os is declared as
a struct (osrelease.OperatingSystem) not a pointer; remove the dn.os != nil
check and directly call dn.os.IsCoreOSVariant() in the conditional (i.e.,
replace the if condition that references dn.os != nil && dn.os.IsCoreOSVariant()
with a single call to dn.os.IsCoreOSVariant()). Ensure no other code assumes
dn.os can be nil and adjust any related logic accordingly.
In `@pkg/daemon/update.go`:
- Around line 1876-1881: Remove the unnecessary inner chroot and differentiate
rpm execution failures from a genuine "package missing" exit; replace
exec.Command("chroot","/rootfs","rpm","-q",pkg) with
exec.Command("rpm","-q",pkg), run the command, and if it returns an
exec.ExitError inspect the exit code (use the ProcessState.ExitCode or
type-assert to *exec.ExitError) — treat exit code 1 as "package missing" (append
to missingPackages and klog.Warningf as before) but for any other non-zero exit
code or non-ExitError log an error via klog.Errorf including the actual error
details (do not classify those as missingPackages); keep existing behavior for
successful runs.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: f31a60a2-cf6d-43a3-afc0-b8aa2da75f9b
📒 Files selected for processing (2)
pkg/daemon/daemon.gopkg/daemon/update.go
029ff9c to
5626213
Compare
There was a problem hiding this comment.
♻️ Duplicate comments (1)
pkg/daemon/daemon.go (1)
2374-2374:⚠️ Potential issue | 🔴 Critical | ⚡ Quick winRemove invalid nil check on
dn.osbefore extension verification.Line 2374 compares
dn.ostonil, butdn.osis a struct (osrelease.OperatingSystem), so this does not compile. Usedn.os.IsCoreOSVariant()directly.Minimal fix
- if dn.os != nil && dn.os.IsCoreOSVariant() { + if dn.os.IsCoreOSVariant() { coreOSDaemon := CoreOSDaemon{dn} if err := coreOSDaemon.verifyExtensionPackages(state.currentConfig); err != nil { return missingODC, inDesiredConfig, fmt.Errorf("extension package verification failed: %w", err) } }#!/bin/bash set -euo pipefail # Verify field type and invalid nil-comparison site. rg -n -C2 'os osrelease\.OperatingSystem|dn\.os != nil|IsCoreOSVariant\(\)' pkg/daemon/daemon.go🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@pkg/daemon/daemon.go` at line 2374, Remove the invalid nil check against dn.os (which is an osrelease.OperatingSystem value) and call the method directly; replace the conditional using "dn.os != nil && dn.os.IsCoreOSVariant()" with just "dn.os.IsCoreOSVariant()" (references: dn.os, IsCoreOSVariant, osrelease.OperatingSystem) so the code compiles and uses the struct receiver correctly.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Duplicate comments:
In `@pkg/daemon/daemon.go`:
- Line 2374: Remove the invalid nil check against dn.os (which is an
osrelease.OperatingSystem value) and call the method directly; replace the
conditional using "dn.os != nil && dn.os.IsCoreOSVariant()" with just
"dn.os.IsCoreOSVariant()" (references: dn.os, IsCoreOSVariant,
osrelease.OperatingSystem) so the code compiles and uses the struct receiver
correctly.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: 3a146b07-94ac-4548-9335-44dcfef01600
📒 Files selected for processing (2)
pkg/daemon/daemon.gopkg/daemon/update.go
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
pkg/daemon/daemon.go (1)
2358-2379:⚠️ Potential issue | 🟠 Major | ⚡ Quick winRun extension verification before publishing
MachineConfigNodeResumed=true.The resumed MCN condition is applied before
verifyExtensionPackages. If verification fails, this path returns an error after a success-like condition was already published.Please move the verification block before the resumed-condition update so success state is emitted only after the gate passes.
Suggested reorder
- err = upgrademonitor.GenerateAndApplyMachineConfigNodes( - &upgrademonitor.Condition{State: mcfgv1.MachineConfigNodeResumed, Reason: string(mcfgv1.MachineConfigNodeResumed), Message: fmt.Sprintf("In desired config %s. Resumed normal operations. Applying proper annotations.", state.currentConfig.Name)}, - nil, - metav1.ConditionTrue, - metav1.ConditionFalse, - dn.node, - dn.mcfgClient, - dn.fgHandler, - pool, - ) - if err != nil { - klog.Errorf("Error making MCN for Resumed true: %v", err) - } - // Verify extension packages are actually installed before marking as done // See: https://redhat.atlassian.net/browse/OCPBUGS-65645 if dn.os.IsCoreOSVariant() { coreOSDaemon := CoreOSDaemon{dn} if err := coreOSDaemon.verifyExtensionPackages(state.currentConfig); err != nil { return missingODC, inDesiredConfig, fmt.Errorf("extension package verification failed: %w", err) } } + + err = upgrademonitor.GenerateAndApplyMachineConfigNodes( + &upgrademonitor.Condition{State: mcfgv1.MachineConfigNodeResumed, Reason: string(mcfgv1.MachineConfigNodeResumed), Message: fmt.Sprintf("In desired config %s. Resumed normal operations. Applying proper annotations.", state.currentConfig.Name)}, + nil, + metav1.ConditionTrue, + metav1.ConditionFalse, + dn.node, + dn.mcfgClient, + dn.fgHandler, + pool, + ) + if err != nil { + klog.Errorf("Error making MCN for Resumed true: %v", err) + }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@pkg/daemon/daemon.go` around lines 2358 - 2379, The MachineConfigNodeResumed condition is being published before extension package verification; move the CoreOS-specific verification (invoke CoreOSDaemon{dn}.verifyExtensionPackages(state.currentConfig)) to run before calling upgrademonitor.GenerateAndApplyMachineConfigNodes that emits the MachineConfigNodeResumed condition so that if verifyExtensionPackages returns an error you return early and never call GenerateAndApplyMachineConfigNodes; update the error paths accordingly (keep the same error return message) and ensure the symbols involved are CoreOSDaemon.verifyExtensionPackages, state.currentConfig, upgrademonitor.GenerateAndApplyMachineConfigNodes, and mcfgv1.MachineConfigNodeResumed.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Outside diff comments:
In `@pkg/daemon/daemon.go`:
- Around line 2358-2379: The MachineConfigNodeResumed condition is being
published before extension package verification; move the CoreOS-specific
verification (invoke
CoreOSDaemon{dn}.verifyExtensionPackages(state.currentConfig)) to run before
calling upgrademonitor.GenerateAndApplyMachineConfigNodes that emits the
MachineConfigNodeResumed condition so that if verifyExtensionPackages returns an
error you return early and never call GenerateAndApplyMachineConfigNodes; update
the error paths accordingly (keep the same error return message) and ensure the
symbols involved are CoreOSDaemon.verifyExtensionPackages, state.currentConfig,
upgrademonitor.GenerateAndApplyMachineConfigNodes, and
mcfgv1.MachineConfigNodeResumed.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: a3e91730-a566-4537-82ce-0a73fda2f778
📒 Files selected for processing (2)
pkg/daemon/daemon.gopkg/daemon/update.go
🚧 Files skipped from review as they are similar to previous changes (1)
- pkg/daemon/update.go
5626213 to
00443ac
Compare
Resolves: OCPBUGS-65645 Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
00443ac to
249d186
Compare
There was a problem hiding this comment.
🧹 Nitpick comments (1)
pkg/daemon/daemon.go (1)
2354-2358: 💤 Low valueAdd a log line before (and after) the extension package verification for observability.
Without a log entry, when a node degrades with
"extension package verification failed"the only trace that verification was even attempted comes from the error itself. A singleklog.Infofbefore the call (and a matching success log) makes it much easier to distinguish "verification ran and found missing packages" from other degradation causes.♻️ Suggested observability improvement
+ klog.Infof("Verifying extension packages for config %s", state.currentConfig.GetName()) if dn.os.IsCoreOSVariant() { coreOSDaemon := CoreOSDaemon{dn} if err := coreOSDaemon.verifyExtensionPackages(state.currentConfig); err != nil { return missingODC, inDesiredConfig, fmt.Errorf("extension package verification failed: %w", err) } + klog.Infof("Extension package verification passed for config %s", state.currentConfig.GetName()) }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@pkg/daemon/daemon.go` around lines 2354 - 2358, Add observability logs around the CoreOS extension package verification: before calling coreOSDaemon.verifyExtensionPackages(state.currentConfig) emit a klog.Infof indicating that extension package verification is starting for the node (include context such as node ID or config reference if available), and after the call (on success) emit a klog.Infof noting verification succeeded; keep the existing error return path unchanged so failures still return the formatted error from coreOSDaemon.verifyExtensionPackages.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Nitpick comments:
In `@pkg/daemon/daemon.go`:
- Around line 2354-2358: Add observability logs around the CoreOS extension
package verification: before calling
coreOSDaemon.verifyExtensionPackages(state.currentConfig) emit a klog.Infof
indicating that extension package verification is starting for the node (include
context such as node ID or config reference if available), and after the call
(on success) emit a klog.Infof noting verification succeeded; keep the existing
error return path unchanged so failures still return the formatted error from
coreOSDaemon.verifyExtensionPackages.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: a638cd4c-9338-4955-bad2-78502469dec9
📒 Files selected for processing (2)
pkg/daemon/daemon.gopkg/daemon/update.go
✅ Files skipped from review due to trivial changes (1)
- pkg/daemon/update.go
Closes: OCPBUGS-65645
- What I did
Note that this PR was created with assistance from Claude.
This adds post-reboot verification to ensure extension packages are actually installed in the RPM database before marking a node as updated. This addresses cases where rpm-ostree reports a success, but packages aren't actually installed. This can lead to tricky debugging situations as was faced in OCPBUGS-63576 when extension installs seem to succeed, but do not. This implementation follows the pattern used in the existing extension's e2e test for verifying the installed packages.
machine-config-operator/test/e2e-2of2/extension_test.go
Line 87 in 48d0c9d
- How to verify it
- Description for the changelog
OCPBUGS-65645: Verify extension packages are installed post node reboot
Summary by CodeRabbit