Skip to content

HYPERFLEET-995 - refactor: move logic from handlers to service#124

Open
kuudori wants to merge 1 commit intoopenshift-hyperfleet:mainfrom
kuudori:HYPERFLEET-995
Open

HYPERFLEET-995 - refactor: move logic from handlers to service#124
kuudori wants to merge 1 commit intoopenshift-hyperfleet:mainfrom
kuudori:HYPERFLEET-995

Conversation

@kuudori
Copy link
Copy Markdown
Contributor

@kuudori kuudori commented May 4, 2026

Move Replace→Patch refactoring and cluster-nodepool ownership checks from handler layer into service layer for better separation of concerns.

  • Replace Replace() with Patch() on ClusterService and NodePoolService, accepting patch request structs instead of pre-mutated models
  • Move generation increment logic (spec/labels change detection) from DAO Replace into service Patch methods
  • Add GetForCluster() and ListByCluster() to NodePoolService, moving ownership verification and filtered listing from handler
  • Remove Replace() from all DAO and service interfaces (Cluster, NodePool, AdapterStatus) and their mocks
  • Inject ClusterDao and GenericService into NodePoolService constructor
  • Remove GenericService dependency from ClusterNodePoolsHandler
  • Add unit tests for Patch, GetForCluster, and ListByCluster
  • Fix stale comment referencing removed Replace method

Summary

  • HYPERFLEET-995

Test Plan

  • Unit tests added/updated
  • make test-all passes
  • make lint passes
  • Helm chart changes validated with make test-helm (if applicable)
  • Deployed to a development cluster and verified (if Helm/config changes)
  • E2E tests passed (if cross-component or major changes)

Summary by CodeRabbit

  • Refactor

    • Switched from replace-style to patch-based updates for clusters and node-pools; ownership checks centralized and adapter-status persistence uses per-resource save.
    • Creation/deletion metadata now handled by the service layer (default system user applied).
  • API

    • Generic list signature simplified (username removed); node-pool/cluster APIs gain Patch, scoped accessors, and cluster-scoped listing.
  • Validation

    • Added UUID path ID validation helper.
  • Documentation

    • Updated guidance on generation handling and persistence.

@openshift-ci openshift-ci Bot requested review from Mischulee and pnguyen44 May 4, 2026 18:41
@openshift-ci
Copy link
Copy Markdown

openshift-ci Bot commented May 4, 2026

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign jsell-rh for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 4, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Enterprise

Run ID: a75d795c-ec6f-4a31-9b33-147369be8cfe

📥 Commits

Reviewing files that changed from the base of the PR and between 48aeed4 and bf575ba.

📒 Files selected for processing (29)
  • pkg/api/adapter_status_types.go
  • pkg/api/cluster_types.go
  • pkg/api/node_pool_types.go
  • pkg/api/presenters/cluster.go
  • pkg/api/presenters/cluster_test.go
  • pkg/api/presenters/node_pool.go
  • pkg/api/presenters/node_pool_test.go
  • pkg/dao/CLAUDE.md
  • pkg/dao/adapter_status.go
  • pkg/dao/cluster.go
  • pkg/dao/mocks/cluster.go
  • pkg/dao/mocks/node_pool.go
  • pkg/dao/node_pool.go
  • pkg/handlers/cluster.go
  • pkg/handlers/cluster_nodepools.go
  • pkg/handlers/cluster_nodepools_test.go
  • pkg/handlers/cluster_test.go
  • pkg/handlers/node_pool.go
  • pkg/handlers/validation.go
  • pkg/services/adapter_status.go
  • pkg/services/cluster.go
  • pkg/services/cluster_test.go
  • pkg/services/generic.go
  • pkg/services/generic_test.go
  • pkg/services/node_pool.go
  • pkg/services/node_pool_test.go
  • pkg/services/status_helpers.go
  • plugins/clusters/plugin.go
  • plugins/nodePools/plugin.go
💤 Files with no reviewable changes (5)
  • pkg/dao/mocks/cluster.go
  • pkg/services/adapter_status.go
  • plugins/clusters/plugin.go
  • pkg/dao/adapter_status.go
  • pkg/dao/cluster.go
🚧 Files skipped from review as they are similar to previous changes (1)
  • pkg/dao/node_pool.go

Walkthrough

The diff removes Replace methods from multiple DAOs and services (AdapterStatus, Cluster, NodePool) and replaces inline replacement flows with patch-based APIs. ClusterService and NodePoolService gain Patch methods; NodePoolService also adds GetByIDAndOwner and ListByCluster. Presenters no longer set CreatedBy/UpdatedBy. Handlers delegate ownership checks and patching to services and stop manual JSON marshaling and username threading. GenericService.List drops the username parameter. Status recomputation now assigns StatusConditions in-memory and persists via Save. Plugin wiring and tests were updated to match revised constructors and call sites.

Sequence Diagram(s)

sequenceDiagram
    participant Client
    participant ClusterHandler as Handler
    participant ClusterService as Service
    participant ClusterDao as DAO
    participant DB
    Client->>ClusterHandler: PATCH /clusters/{id} with patch
    ClusterHandler->>ClusterService: Patch(ctx, id, patch)
    ClusterService->>ClusterDao: GetForUpdate(ctx, id)
    ClusterDao->>DB: SELECT ... FOR UPDATE
    DB-->>ClusterDao: cluster row
    ClusterDao-->>ClusterService: cluster
    ClusterService->>ClusterService: applyClusterPatch(patch)
    Note over ClusterService: marshal patch fields\ncompare Spec/Labels bytes\nincrement Generation if changed
    ClusterService->>ClusterDao: Save(ctx, cluster)
    ClusterDao->>DB: UPDATE cluster
    DB-->>ClusterDao: updated cluster
    ClusterDao-->>ClusterService: cluster
    ClusterService->>ClusterService: update status from adapters
    ClusterService-->>ClusterHandler: updated cluster
    ClusterHandler-->>Client: 200 OK with cluster
Loading
sequenceDiagram
    participant Client
    participant NodePoolHandler as Handler
    participant NodePoolService as Service
    participant NodePoolDao as DAO
    participant DB
    Client->>NodePoolHandler: GET /clusters/{clusterID}/nodepools/{nodePoolID}
    NodePoolHandler->>NodePoolService: GetByIDAndOwner(ctx, nodePoolID, clusterID)
    NodePoolService->>NodePoolDao: Get(ctx, nodePoolID)
    NodePoolDao->>DB: SELECT nodepool
    DB-->>NodePoolDao: nodepool
    NodePoolDao-->>NodePoolService: nodepool
    Note over NodePoolService: validate OwnerID == clusterID
    alt OwnerID matches
        NodePoolService-->>NodePoolHandler: nodepool
        NodePoolHandler-->>Client: 200 OK
    else OwnerID mismatch or not found
        NodePoolService-->>NodePoolHandler: NotFound error
        NodePoolHandler-->>Client: 404 Not Found
    end
Loading

Estimated code review effort

🎯 5 (Critical) | ⏱️ ~120 minutes

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 48.94% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly identifies the main change as moving logic from handlers to the service layer, which aligns with the PR's core objective of refactoring Replace→Patch and ownership checks.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (1)
pkg/services/cluster.go (1)

98-110: ⚡ Quick win

Short-circuit semantic no-op patches.

When spec and labels are unchanged, this still does a full Save plus status recomputation. Returning early keeps PATCH idempotent and avoids unnecessary DB churn; sqlNodePoolService.Patch has the same pattern.

Proposed change
-	if !bytes.Equal(oldSpec, cluster.Spec) || !bytes.Equal(oldLabels, cluster.Labels) {
-		cluster.Generation++
-	}
-
-	if saveErr := s.clusterDao.Save(ctx, cluster); saveErr != nil {
+	if bytes.Equal(oldSpec, cluster.Spec) && bytes.Equal(oldLabels, cluster.Labels) {
+		return cluster, nil
+	}
+
+	cluster.Generation++
+
+	if saveErr := s.clusterDao.Save(ctx, cluster); saveErr != nil {
 		return nil, handleUpdateError("Cluster", saveErr)
 	}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@pkg/services/cluster.go` around lines 98 - 110, Detect the no-op patch by
checking bytes.Equal(oldSpec, cluster.Spec) and bytes.Equal(oldLabels,
cluster.Labels) and short-circuit before mutating Generation, calling
s.clusterDao.Save, or recomputing status; when both spec and labels are
unchanged return the current cluster and nil immediately (mirroring
sqlNodePoolService.Patch behavior). Modify the logic around the existing if that
increments cluster.Generation and the subsequent s.clusterDao.Save /
s.UpdateClusterStatusFromAdapters calls so the equality case returns early and
only when there are changes proceed to increment Generation, call
s.clusterDao.Save(ctx, cluster), and then s.UpdateClusterStatusFromAdapters(ctx,
cluster.ID).
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@pkg/handlers/cluster_nodepools.go`:
- Line 90: The handler calls to h.nodePoolService.GetForCluster (used at lines
where nodePool is fetched) must be preceded or accompanied by an explicit
cluster existence check so we don't operate on nodepools whose OwnerID
references a deleted cluster; add a call to the cluster lookup/validation (e.g.,
h.clusterService.Get or h.clusterService.Exists) using the same clusterID used
for scoped operations (the same pattern as ListByCluster), return the
appropriate not-found error if the cluster is missing, then proceed to call
GetForCluster; do this for all occurrences (the nodePool fetch at GetForCluster
invocations referenced in this file).

In `@pkg/services/node_pool.go`:
- Around line 145-149: The search merge in node_pool.go incorrectly appends the
cluster filter directly to args.Search (using args.Search and clusterID), which
breaks operator precedence; update the logic so when args.Search is non-empty
you set args.Search = "(" + args.Search + ") AND owner_id = ?" (or similar) and
pass clusterID as a parameter instead of embedding it inline, and when empty set
args.Search = "owner_id = ?" with clusterID parameterized; ensure any code that
executes the query uses the parameter placeholder and supplies clusterID to
avoid raw string interpolation and to correctly scope the cluster filter across
the entire expression.

---

Nitpick comments:
In `@pkg/services/cluster.go`:
- Around line 98-110: Detect the no-op patch by checking bytes.Equal(oldSpec,
cluster.Spec) and bytes.Equal(oldLabels, cluster.Labels) and short-circuit
before mutating Generation, calling s.clusterDao.Save, or recomputing status;
when both spec and labels are unchanged return the current cluster and nil
immediately (mirroring sqlNodePoolService.Patch behavior). Modify the logic
around the existing if that increments cluster.Generation and the subsequent
s.clusterDao.Save / s.UpdateClusterStatusFromAdapters calls so the equality case
returns early and only when there are changes proceed to increment Generation,
call s.clusterDao.Save(ctx, cluster), and then
s.UpdateClusterStatusFromAdapters(ctx, cluster.ID).
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Enterprise

Run ID: 6beb0281-8dbe-4d84-acbd-4f6ca3763817

📥 Commits

Reviewing files that changed from the base of the PR and between 9798ec3 and 637ed1a.

📒 Files selected for processing (17)
  • pkg/dao/adapter_status.go
  • pkg/dao/cluster.go
  • pkg/dao/mocks/cluster.go
  • pkg/dao/mocks/node_pool.go
  • pkg/dao/node_pool.go
  • pkg/handlers/cluster.go
  • pkg/handlers/cluster_nodepools.go
  • pkg/handlers/cluster_nodepools_test.go
  • pkg/handlers/node_pool.go
  • pkg/services/adapter_status.go
  • pkg/services/cluster.go
  • pkg/services/cluster_test.go
  • pkg/services/node_pool.go
  • pkg/services/node_pool_test.go
  • pkg/services/status_helpers.go
  • plugins/clusters/plugin.go
  • plugins/nodePools/plugin.go
💤 Files with no reviewable changes (7)
  • pkg/dao/node_pool.go
  • pkg/dao/adapter_status.go
  • pkg/dao/cluster.go
  • pkg/dao/mocks/node_pool.go
  • plugins/clusters/plugin.go
  • pkg/services/adapter_status.go
  • pkg/dao/mocks/cluster.go

Comment thread pkg/handlers/cluster_nodepools.go Outdated
Comment thread pkg/services/node_pool.go
@rh-amarin
Copy link
Copy Markdown
Contributor

pkg/dao/node_pool.go:102 — JIRA AC miss: SoftDeleteByOwner still increments generation in the DAO

SoftDeleteByOwner still contains:

"generation": gorm.Expr("generation + 1"),

JIRA-995's acceptance criteria explicitly states: "Generation increment logic is removed from pkg/dao/cluster.go and pkg/dao/node_pool.go", and the description singles out this function by name: "SoftDeleteByOwner() bakes generation + 1 directly into SQL. These are business rules that belong in the service layer."

The Replace paths in both DAOs were correctly cleaned up, but SoftDeleteByOwner was missed.

Suggested fix: accept a pre-computed newGeneration int32 parameter from the caller and use it directly, consistent with how Patch now handles generation:

// DAO interface
SoftDeleteByOwner(ctx context.Context, ownerID string, t time.Time, deletedBy string, newGeneration int32) error

// implementation
Updates(map[string]interface{}{
    "deleted_time": t,
    "deleted_by":   deletedBy,
    "generation":   newGeneration,
})

The service layer (wherever SoftDeleteByOwner is called) computes currentGeneration + 1 before passing it down.


Posted by review-pr skill

Comment thread pkg/services/node_pool.go
Comment thread pkg/handlers/cluster.go
Comment thread pkg/handlers/cluster_nodepools.go Outdated

// Verify cluster exists
_, err := h.clusterService.Get(ctx, clusterID)
_, err := h.nodePoolService.GetForCluster(ctx, nodePoolID, clusterID)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we really need to verify that cluster exists.
Shouldn't the patch fail if it doesn't?

And... in case we want to verify... is this a business rule to be enforced in the service layer?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This pattern is also in cluster_status.go and nodepool_status.go Create method

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The handler can call multiple "usecase" functions, like an orchestrator, with business logic isolated. IMO.
In this case, we have a "SoftDelete" function in the service layer that accepts only nodepool_id.

Comment thread pkg/dao/adapter_status.go
Where("resource_type = ? AND resource_id = ? AND adapter = ?",
adapterStatus.ResourceType, adapterStatus.ResourceID, adapterStatus.Adapter).
Where(
"observed_generation < ? OR (observed_generation = ? AND last_report_time < ?)",
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we consider this to be a business rule and pertaining to the service layer?

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (1)
pkg/services/cluster_test.go (1)

1639-1731: ⚡ Quick win

Add a service-level test for patching a deleted cluster.

ClusterService.Patch now owns the soft-delete conflict check, but this suite only covers changed/unchanged/not-found. The handler test mocks that behavior, so a regression in the real service path would slip through here.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@pkg/services/cluster_test.go` around lines 1639 - 1731, Add a new subtest
inside TestClusterPatch that verifies ClusterService.Patch returns a conflict
for a soft-deleted cluster: create the service (NewClusterService) with test
DAOs, insert a cluster into clusterDao.clusters with Meta.ID "c1" and a non-nil
DeletedAt (or Deleted flag used by your model), call service.Patch(ctx, "c1",
&api.ClusterPatchRequest{Spec: &someSpec}), and assert svcErr is non-nil and
svcErr.HTTPCode equals 409; this ensures ClusterService.Patch's soft-delete
conflict check is covered at the service level.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@pkg/services/cluster.go`:
- Around line 91-107: The Patch method currently forwards the incoming patch
pointer into applyClusterPatch without checking for nil, which can cause a
panic; update sqlClusterService.Patch to explicitly validate that the patch
argument is non-nil at the start (before calling applyClusterPatch or using
patch fields) and return a service error (e.g., errors.Validation("patch must be
provided") or similar) when nil is passed so callers receive a proper error
instead of panicking; touch the Patch function and any callers if needed to
ensure the nil-check happens at the service boundary before invoking
applyClusterPatch.

In `@pkg/services/node_pool.go`:
- Around line 104-127: The row lock from nodePoolDao.GetForUpdate is lost
because Save uses a new DB session via sessionFactory.New(ctx); to fix, wrap the
entire method (the code using nodePoolDao.GetForUpdate, mutating nodePool,
IncrementGeneration, and calling nodePoolDao.Save) in a single DB transaction or
add a composite DAO method that performs GetForUpdate + update/save within the
same session/transaction so the SELECT ... FOR UPDATE lock persists; update the
NodePool update method (the block using nodePoolDao.GetForUpdate and
nodePoolDao.Save) and likewise adjust SoftDelete paths (the sections around the
calls at the same pattern) to use the same transactional/session context instead
of separate sessionFactory.New(ctx) calls.

---

Nitpick comments:
In `@pkg/services/cluster_test.go`:
- Around line 1639-1731: Add a new subtest inside TestClusterPatch that verifies
ClusterService.Patch returns a conflict for a soft-deleted cluster: create the
service (NewClusterService) with test DAOs, insert a cluster into
clusterDao.clusters with Meta.ID "c1" and a non-nil DeletedAt (or Deleted flag
used by your model), call service.Patch(ctx, "c1",
&api.ClusterPatchRequest{Spec: &someSpec}), and assert svcErr is non-nil and
svcErr.HTTPCode equals 409; this ensures ClusterService.Patch's soft-delete
conflict check is covered at the service level.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Enterprise

Run ID: a6083079-1a3a-443e-8790-fdd853bffc6f

📥 Commits

Reviewing files that changed from the base of the PR and between db88e17 and 48aeed4.

📒 Files selected for processing (28)
  • pkg/api/cluster_types.go
  • pkg/api/node_pool_types.go
  • pkg/api/presenters/cluster.go
  • pkg/api/presenters/cluster_test.go
  • pkg/api/presenters/node_pool.go
  • pkg/api/presenters/node_pool_test.go
  • pkg/dao/CLAUDE.md
  • pkg/dao/adapter_status.go
  • pkg/dao/cluster.go
  • pkg/dao/mocks/cluster.go
  • pkg/dao/mocks/node_pool.go
  • pkg/dao/node_pool.go
  • pkg/handlers/cluster.go
  • pkg/handlers/cluster_nodepools.go
  • pkg/handlers/cluster_nodepools_test.go
  • pkg/handlers/cluster_test.go
  • pkg/handlers/node_pool.go
  • pkg/handlers/validation.go
  • pkg/services/adapter_status.go
  • pkg/services/cluster.go
  • pkg/services/cluster_test.go
  • pkg/services/generic.go
  • pkg/services/generic_test.go
  • pkg/services/node_pool.go
  • pkg/services/node_pool_test.go
  • pkg/services/status_helpers.go
  • plugins/clusters/plugin.go
  • plugins/nodePools/plugin.go
💤 Files with no reviewable changes (5)
  • plugins/clusters/plugin.go
  • pkg/dao/mocks/cluster.go
  • pkg/dao/adapter_status.go
  • pkg/services/adapter_status.go
  • pkg/dao/cluster.go
🚧 Files skipped from review as they are similar to previous changes (1)
  • pkg/handlers/validation.go

Comment thread pkg/services/cluster.go
Comment on lines +91 to +107
func (s *sqlClusterService) Patch(
ctx context.Context, id string, patch *api.ClusterPatchRequest,
) (*api.Cluster, *errors.ServiceError) {
cluster, err := s.clusterDao.GetForUpdate(ctx, id)
if err != nil {
return nil, handleUpdateError("Cluster", err)
return nil, handleGetError("Cluster", "id", id, err)
}

if cluster.DeletedTime != nil {
return nil, errors.ConflictState("Cluster '%s' is marked for deletion", id)
}

oldSpec := cluster.Spec
oldLabels := cluster.Labels

if applyErr := applyClusterPatch(cluster, patch); applyErr != nil {
return nil, errors.Validation("Invalid patch data: %v", applyErr)
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot May 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Guard against nil patches at the service boundary.

Patch now owns the mutation logic, but it forwards patch into applyClusterPatch without validating it first. A non-handler caller can pass nil here and trigger a panic instead of getting a service error.

Suggested fix
 func (s *sqlClusterService) Patch(
 	ctx context.Context, id string, patch *api.ClusterPatchRequest,
 ) (*api.Cluster, *errors.ServiceError) {
+	if patch == nil {
+		return nil, errors.Validation("patch request must not be nil")
+	}
+
 	cluster, err := s.clusterDao.GetForUpdate(ctx, id)
 	if err != nil {
 		return nil, handleGetError("Cluster", "id", id, err)
 	}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@pkg/services/cluster.go` around lines 91 - 107, The Patch method currently
forwards the incoming patch pointer into applyClusterPatch without checking for
nil, which can cause a panic; update sqlClusterService.Patch to explicitly
validate that the patch argument is non-nil at the start (before calling
applyClusterPatch or using patch fields) and return a service error (e.g.,
errors.Validation("patch must be provided") or similar) when nil is passed so
callers receive a proper error instead of panicking; touch the Patch function
and any callers if needed to ensure the nil-check happens at the service
boundary before invoking applyClusterPatch.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Guarded in handler

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kuudori, understood. That works for the current call paths. Just worth noting that since ClusterService is an interface, any future caller (e.g., a background job, another service, or a test) that bypasses the handler won't have that guard. A nil check at the service boundary is a cheap safeguard, but happy to defer if you'd prefer to keep validation at the handler layer.

Comment thread pkg/services/node_pool.go
Comment on lines +104 to +127
nodePool, err := s.nodePoolDao.GetForUpdate(ctx, nodePoolID)
if err != nil {
return nil, handleUpdateError("NodePool", err)
return nil, handleGetError("NodePool", "id", nodePoolID, err)
}

if nodePool.DeletedTime != nil {
return nil, errors.ConflictState("NodePool '%s' is marked for deletion", nodePoolID)
}

oldSpec := nodePool.Spec
oldLabels := nodePool.Labels

if applyErr := applyNodePoolPatch(nodePool, patch); applyErr != nil {
return nil, errors.Validation("Invalid patch data: %v", applyErr)
}

if bytes.Equal(oldSpec, nodePool.Spec) && bytes.Equal(oldLabels, nodePool.Labels) {
return nodePool, nil
}

nodePool.IncrementGeneration()

if saveErr := s.nodePoolDao.Save(ctx, nodePool); saveErr != nil {
return nil, handleUpdateError("NodePool", saveErr)
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot May 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | 🏗️ Heavy lift

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
set -euo pipefail

printf '== NodePool DAO lock/save implementation ==\n'
sed -n '1,140p' pkg/dao/node_pool.go

printf '\n== SessionFactory / New(ctx) implementation ==\n'
rg -n -C4 'type .*SessionFactory|func \(.*\) New\(ctx context\.Context\)' pkg cmd

printf '\n== Transaction helpers / tx propagation through context ==\n'
rg -n -C4 'WithTransaction|Begin\(|Commit\(|Rollback\(|TxFromContext|context\.WithValue.*tx|type .*tx.*Key' pkg cmd

Repository: openshift-hyperfleet/hyperfleet-api

Length of output: 31039


Row lock does not persist across separate DAO calls without explicit transaction wrapping.

The GetForUpdate() call at line 104 acquires a row lock via SELECT ... FOR UPDATE, but the Save() call at line 126 opens a fresh DB session through sessionFactory.New(ctx). Since New(ctx) returns a separate session when no transaction exists in the context, the lock acquired by GetForUpdate does not protect the later Save operation. Concurrent calls can observe the same generation value and both persist conflicting increments.

Same issue applies at lines 174 and 188 in SoftDelete.

Wrap each method in a single transaction or create a composite DAO method that holds the lock through the mutation.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@pkg/services/node_pool.go` around lines 104 - 127, The row lock from
nodePoolDao.GetForUpdate is lost because Save uses a new DB session via
sessionFactory.New(ctx); to fix, wrap the entire method (the code using
nodePoolDao.GetForUpdate, mutating nodePool, IncrementGeneration, and calling
nodePoolDao.Save) in a single DB transaction or add a composite DAO method that
performs GetForUpdate + update/save within the same session/transaction so the
SELECT ... FOR UPDATE lock persists; update the NodePool update method (the
block using nodePoolDao.GetForUpdate and nodePoolDao.Save) and likewise adjust
SoftDelete paths (the sections around the calls at the same pattern) to use the
same transactional/session context instead of separate sessionFactory.New(ctx)
calls.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

False positive. All transactions stored in context per request

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kuudori, understood — if sessionFactory.New(ctx) extracts an existing transaction from the context (i.e., transactions are propagated through the request context), then GetForUpdate and Save share the same session, the SELECT ... FOR UPDATE lock remains valid for the lifetime of the transaction, and this concern doesn't apply. I've noted this for future reviews.


📝 Learnings were identified and not saved (knowledge base disabled). Enable

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants