AGENT-1491: reclaim disk space from the node when IRI resource is deleted#5988
AGENT-1491: reclaim disk space from the node when IRI resource is deleted#5988andfasano wants to merge 1 commit into
Conversation
|
Pipeline controller notification For optional jobs, comment This repository is configured in: LGTM mode |
|
@andfasano: This pull request references AGENT-1491 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "5.0.0" version, but no target version was set. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
WalkthroughAdds an exported ChangesIRI Deletion and Storage Reclamation
sequenceDiagram
actor Reconciler as Reconciler Loop
participant Manager as IRI Manager
participant Registry as Local Registry (TCP 22625)
participant MCN as MachineConfigNode
participant Storage as Local Storage (/var/lib/iri-registry)
Reconciler->>Manager: syncInternalReleaseImage()
alt feature never activated (no registry dir)
Manager-->>Reconciler: short-circuit, long requeue
else IRI present and active
Manager->>Registry: getIRIRegistry() / isRegistryPortListening()
Manager-->>MCN: refresh status or set degraded
else IRI missing or DeletionTimestamp set
Manager->>Registry: isRegistryPortListening()
alt port listening
Registry-->>Manager: connection ok
Manager-->>Reconciler: requeue (wait for shutdown)
else port not listening
Manager->>MCN: cleanup IRI status
Manager->>Storage: reclaimRegistryStorage()
Storage-->>Manager: storage reclaimed
Manager-->>Reconciler: complete
end
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes 🚥 Pre-merge checks | ✅ 9 | ❌ 3❌ Failed checks (3 warnings)
✅ Passed checks (9 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Tip 💬 Introducing Slack Agent: The best way for teams to turn conversations into code.Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.
Built for teams:
One agent for your entire SDLC. Right inside Slack. Comment |
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: andfasano The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
There was a problem hiding this comment.
Actionable comments posted: 5
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In
`@pkg/daemon/internalreleaseimage/.claude/skills/iri-mcd/acceptance/feature-disabled.md`:
- Line 1: The heading "IRI feature enabled" is inverted for a file that
documents the disabled path; update the top-level heading to clearly indicate
the disabled scenario (e.g., change "IRI feature enabled" to "IRI feature
disabled" or an equivalent phrasing) so the document title matches the
acceptance scenario and removes ambiguity; edit the line containing the heading
string "IRI feature enabled" in this file to the corrected text.
In `@pkg/daemon/internalreleaseimage/internalreleaseimage_manager_test.go`:
- Around line 108-135: The tests for deletion cases need to assert filesystem
side-effects of reclaimRegistryStorage rather than only MCN status: update the
two cases ("IRI deletion in progress - registry still active" and "IRI deleted -
registry down") to create a real file under the test's registryDataPath before
invoking the manager and then assert after reconciliation that when
setupRegistry returns a healthy /v2 response the file still exists, and when
registryDisabled==true (registry down) the file is removed; use the existing
FakeIRIRegistry and the test helpers (machineConfigNode, iri, registryDataPath,
reclaimRegistryStorage) to create the file, invoke the same reconcile path used
by the other tests, and verify presence/absence of the file as the primary
assertion in each case in addition to the existing MCN status checks.
- Around line 143-149: The test never exercises the "feature disabled" branch
because t.TempDir() always creates tempDir; adjust the setup so
skipRegistryDirSetup yields a non-existent directory: when
tc.skipRegistryDirSetup is true, either remove the created tempDir
(os.RemoveAll(tempDir)) or set registryDataPath to a non-created subpath (e.g.,
filepath.Join(tempDir, "nonexistent")) so that the code using registryDataPath
in the test (variables tempDir, skipRegistryDirSetup, registryDataPath in
internalreleaseimage_manager_test.go) sees a missing directory and the branch is
actually exercised.
In `@pkg/daemon/internalreleaseimage/internalreleaseimage_manager.go`:
- Around line 479-485: The notInstalled() helper currently treats any os.Stat
error as "not installed"; change it so only os.IsNotExist(err) returns true. In
Manager.notInstalled, call os.Stat(registryDataPath) and: if err == nil return
false (exists); if os.IsNotExist(err) return true; for any other err return
false (treat as installed) and preferably emit a warning via the Manager logger
(e.g., i.logger or i.log) so permission/IO errors are visible instead of
silently skipping cleanup.
- Around line 518-520: The cleanup path leaves a stale top-level degraded
condition because cleanupMachineConfigNodeStatus() currently returns early when
Status.InternalReleaseImage.Releases is empty; update
cleanupMachineConfigNodeStatus to still clear the
MachineConfigNodeInternalReleaseImageDegraded condition (the same condition set
by setMachineConfigNodeAsDegraded()) even when Releases is empty — i.e., remove
the early-return, detect the degraded condition on the node's top-level status,
and explicitly remove/clear that condition (and persist the node status update)
so opt-out always clears the degraded flag.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: 5e778840-3130-4843-b6d9-bbe7b7f99913
📒 Files selected for processing (8)
pkg/daemon/constants/constants.gopkg/daemon/internalreleaseimage/.claude/skills/iri-mcd/SKILL.mdpkg/daemon/internalreleaseimage/.claude/skills/iri-mcd/acceptance/feature-disabled.mdpkg/daemon/internalreleaseimage/.claude/skills/iri-mcd/acceptance/storage-reclaim.mdpkg/daemon/internalreleaseimage/internalreleaseimage_helpers_test.gopkg/daemon/internalreleaseimage/internalreleaseimage_manager.gopkg/daemon/internalreleaseimage/internalreleaseimage_manager_test.gopkg/daemon/internalreleaseimage/iriregistry.go
|
/test ? |
|
/test e2e-agent-compact-ipv4-iso-no-registry |
a57161e to
2b55ac8
Compare
|
/retest-required |
|
/test e2e-agent-compact-ipv4-iso-no-registry |
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
pkg/daemon/internalreleaseimage/internalreleaseimage_manager_test.go (1)
143-145: ⚡ Quick winMake reclaim assertion resilient to directory removal.
This check assumes the directory still exists. Reclaim logic may validly remove the whole directory, so this can become brittle. Prefer asserting the sentinel file is gone (or accept
os.IsNotExist).♻️ Suggested patch
- entries, err := os.ReadDir(registryDataPath) - assert.NoError(t, err) - assert.Empty(t, entries, "Storage should be reclaimed when registry is down") + testFile := filepath.Join(registryDataPath, "test-data.txt") + _, err := os.Stat(testFile) + assert.True(t, os.IsNotExist(err), "Storage should be reclaimed when registry is down")🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pkg/daemon/internalreleaseimage/internalreleaseimage_manager_test.go` around lines 143 - 145, The current assertion using os.ReadDir(registryDataPath) assumes the directory exists; change the check so it accepts either the directory being removed or empty by handling os.ReadDir errors: if os.ReadDir returns an error, assert that errors.Is(err, os.ErrNotExist) (or os.IsNotExist(err)), otherwise assert no entries; alternatively, check for a specific sentinel file under registryDataPath (e.g., filepath.Join(registryDataPath, "<sentinel>")) and assert that os.Stat on that sentinel returns os.ErrNotExist. Update the test around the os.ReadDir(registryDataPath) call and the subsequent assert.Empty to implement this resilient logic referencing registryDataPath and the os.ReadDir/os.Stat result.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@pkg/daemon/internalreleaseimage/.claude/skills/iri-mcd/SKILL.md`:
- Line 34: Fix the typo in the SKILL.md section header by changing the heading
text "Test implentation notes" to "Test implementation notes" in the file
SKILL.md (look for the exact header string "Test implentation notes" to locate
the line).
---
Nitpick comments:
In `@pkg/daemon/internalreleaseimage/internalreleaseimage_manager_test.go`:
- Around line 143-145: The current assertion using os.ReadDir(registryDataPath)
assumes the directory exists; change the check so it accepts either the
directory being removed or empty by handling os.ReadDir errors: if os.ReadDir
returns an error, assert that errors.Is(err, os.ErrNotExist) (or
os.IsNotExist(err)), otherwise assert no entries; alternatively, check for a
specific sentinel file under registryDataPath (e.g.,
filepath.Join(registryDataPath, "<sentinel>")) and assert that os.Stat on that
sentinel returns os.ErrNotExist. Update the test around the
os.ReadDir(registryDataPath) call and the subsequent assert.Empty to implement
this resilient logic referencing registryDataPath and the os.ReadDir/os.Stat
result.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: c16aaace-8e5f-425a-a3f0-816d1dbfdc8e
📒 Files selected for processing (8)
pkg/daemon/constants/constants.gopkg/daemon/internalreleaseimage/.claude/skills/iri-mcd/SKILL.mdpkg/daemon/internalreleaseimage/.claude/skills/iri-mcd/acceptance/feature-disabled.mdpkg/daemon/internalreleaseimage/.claude/skills/iri-mcd/acceptance/storage-reclaim.mdpkg/daemon/internalreleaseimage/internalreleaseimage_helpers_test.gopkg/daemon/internalreleaseimage/internalreleaseimage_manager.gopkg/daemon/internalreleaseimage/internalreleaseimage_manager_test.gopkg/daemon/internalreleaseimage/iriregistry.go
✅ Files skipped from review due to trivial changes (4)
- pkg/daemon/internalreleaseimage/.claude/skills/iri-mcd/acceptance/storage-reclaim.md
- pkg/daemon/constants/constants.go
- pkg/daemon/internalreleaseimage/iriregistry.go
- pkg/daemon/internalreleaseimage/internalreleaseimage_helpers_test.go
🚧 Files skipped from review as they are similar to previous changes (2)
- pkg/daemon/internalreleaseimage/.claude/skills/iri-mcd/acceptance/feature-disabled.md
- pkg/daemon/internalreleaseimage/internalreleaseimage_manager.go
4a10ade to
9f33795
Compare
|
/test e2e-agent-compact-ipv4-iso-no-registry |
|
|
||
| klog.V(2).Infof("Removed registry data path: %s", path) | ||
| } | ||
|
|
There was a problem hiding this comment.
Should the base directory /var/lib/iri-registry also be removed?
There was a problem hiding this comment.
That's a very good question. I preferred not to remove it since MCD does not technically own it (it's created by the Assisted Installer during the installation). On the other hand, it's also true that we don't support re-enabling the feature once removed, so maybe it could be a safer option
9f33795 to
27197e0
Compare
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@pkg/daemon/internalreleaseimage/internalreleaseimage_manager.go`:
- Around line 552-560: The current logic in
i.wasFeatureActivated()/handleIRIDeletion treats a NotFound error from the
lister as an implicit delete and schedules destructive cleanup; change this so
IsNotFound does NOT count as an opt-out: update wasFeatureActivated to
distinguish NotFound from an explicit deletion signal (e.g., check for a
persisted flag on-disk or a previously observed deletionTimestamp/annotation)
and only return "wasActivated==false" when you have an explicit recorded
deletion or persisted state proving the IRI existed and was deleted; also update
handleIRIDeletion to ignore lister NotFound transient cache misses (do not call
queue.AddAfter/InternalReleaseImageInstanceName reclaim logic on IsNotFound) and
only proceed with reclamation when the explicit deletion indicator is present.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: b4b98ab4-59dc-4d7c-b206-8d00eabb3c36
📒 Files selected for processing (8)
pkg/daemon/constants/constants.gopkg/daemon/internalreleaseimage/.claude/skills/iri-mcd/SKILL.mdpkg/daemon/internalreleaseimage/.claude/skills/iri-mcd/acceptance/feature-disabled.mdpkg/daemon/internalreleaseimage/.claude/skills/iri-mcd/acceptance/storage-reclaim.mdpkg/daemon/internalreleaseimage/internalreleaseimage_helpers_test.gopkg/daemon/internalreleaseimage/internalreleaseimage_manager.gopkg/daemon/internalreleaseimage/internalreleaseimage_manager_test.gopkg/daemon/internalreleaseimage/iriregistry.go
✅ Files skipped from review due to trivial changes (3)
- pkg/daemon/internalreleaseimage/internalreleaseimage_helpers_test.go
- pkg/daemon/internalreleaseimage/iriregistry.go
- pkg/daemon/internalreleaseimage/.claude/skills/iri-mcd/SKILL.md
🚧 Files skipped from review as they are similar to previous changes (2)
- pkg/daemon/internalreleaseimage/.claude/skills/iri-mcd/acceptance/feature-disabled.md
- pkg/daemon/internalreleaseimage/.claude/skills/iri-mcd/acceptance/storage-reclaim.md
|
/test e2e-agent-compact-ipv4-iso-no-registry |
|
@andfasano: all tests passed! Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
- What I did
This patch adds a new behavior to IRI MCD manager to reclaim the disk space on the control plane in case of IRI removal (this happens when the user decides to opt-out from the feature).
Also, this patch adds the necessary Claude skill files to provide more context for the code generation of the specific new behavior.
- How to verify it
Deploy an OCP cluster using the OVE ISO, then delete the IRI resource (this requires a previous upgrade)
- Description for the changelog
Reclaim disk space on node when the InternalReleaseImage resource is deleted.
Summary by CodeRabbit
New Features
Bug Fixes
Tests
Documentation
Chores