Skip to content

DPTP-4731: fix cluster profile secret race in aggregated runs#5083

Open
deepsm007 wants to merge 1 commit intoopenshift:mainfrom
deepsm007:fix-cluster-profile-secret-race
Open

DPTP-4731: fix cluster profile secret race in aggregated runs#5083
deepsm007 wants to merge 1 commit intoopenshift:mainfrom
deepsm007:fix-cluster-profile-secret-race

Conversation

@deepsm007
Copy link
Copy Markdown
Contributor

Aggregated jobs run multiple ci-operator instances in the same namespace. Each tries to create a secret named -cluster-profile to hold cloud credentials. When runs get different cloud accounts, they race on that shared name causing failures like secrets "e2e-azure-ovn-cluster-profile" not found.

The fix uses the source secret name (e.g. cluster-secrets-azure4) as the target in the test namespace instead. Runs resolving to the same account reuse the same secret safely via UpsertImmutableSecret's DeepEqual check. Runs with different accounts get different names and never conflict. leaseStep notifies multiStageTestStep of the resolved name so pods mount the right secret.

https://prow.ci.openshift.org/view/gs/test-platform-results/logs/aggregator-periodic-ci-openshift-release-main-ci-4.22-e2e-azure-ovn/2039355967551836160

/cc @droslean @openshift/test-platform

@openshift-ci-robot
Copy link
Copy Markdown
Contributor

Pipeline controller notification
This repo is configured to use the pipeline controller. Second-stage tests will be triggered either automatically or after lgtm label is added, depending on the repository configuration. The pipeline controller will automatically detect which contexts are required and will utilize /test Prow commands to trigger the second stage.

For optional jobs, comment /test ? to see a list of all defined jobs. To trigger manually all jobs from second stage use /pipeline required command.

This repository is configured in: automatic mode

@openshift-ci-robot
Copy link
Copy Markdown
Contributor

openshift-ci-robot commented Apr 2, 2026

@deepsm007: This pull request references DPTP-4731 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the bug to target the "4.22.0" version, but no target version was set.

Details

In response to this:

Aggregated jobs run multiple ci-operator instances in the same namespace. Each tries to create a secret named -cluster-profile to hold cloud credentials. When runs get different cloud accounts, they race on that shared name causing failures like secrets "e2e-azure-ovn-cluster-profile" not found.

The fix uses the source secret name (e.g. cluster-secrets-azure4) as the target in the test namespace instead. Runs resolving to the same account reuse the same secret safely via UpsertImmutableSecret's DeepEqual check. Runs with different accounts get different names and never conflict. leaseStep notifies multiStageTestStep of the resolved name so pods mount the right secret.

https://prow.ci.openshift.org/view/gs/test-platform-results/logs/aggregator-periodic-ci-openshift-release-main-ci-4.22-e2e-azure-ovn/2039355967551836160

/cc @droslean @openshift/test-platform

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Apr 2, 2026
@openshift-ci openshift-ci bot requested review from a team and droslean April 2, 2026 20:12
@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Apr 2, 2026

Walkthrough

Cluster-profile secret handling changed to import secrets using the source secret name unchanged and to propagate that resolved name into wrapped steps via a new SetProfileSecretName(name string) API when supported.

Changes

Cohort / File(s) Summary
Lease import logic
pkg/steps/lease.go
Call importClusterProfileSecret(ctx, cpDetails.Secret) using only the source secret name; copy destination secret name now exactly matches source (ObjectMeta.Name: secretName); conditionally call SetProfileSecretName(cpDetails.Secret) on wrapped steps.
Multi-stage step
pkg/steps/multi_stage/multi_stage.go
Add resolvedProfileSecretName field and SetProfileSecretName(name string) method on multiStageTestStep; update profileSecretName() to return the resolved name when set and simplify default derivation.
IP pool step
pkg/steps/ip_pool.go
Add SetProfileSecretName(string) on ipPoolStep that forwards the name to the wrapped step if it implements the setter.
Tests — lease expectations
pkg/steps/lease_test.go
Update expected Secret fixtures: change ObjectMeta.Name values from derived names (e.g., e2e-aws-ovn-cluster-profile) to the source secret name (cluster-secrets-aws).
Tests — multi-stage naming
pkg/steps/multi_stage/multi_stage_test.go
Extend test cases to pass resolvedProfileSecretName into multiStageTestStep and assert precedence of the resolved name over previous additional-suffix derivation.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Warning

There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure.

🔧 golangci-lint (2.11.4)

Command failed


Comment @coderabbitai help to get the list of available commands and usage tips.

@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci bot commented Apr 2, 2026

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: deepsm007

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Apr 2, 2026
Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@pkg/steps/lease.go`:
- Around line 270-272: The duck-typed SetProfileSecretName call on
multiStageTestStep.s.wrapped doesn't reach the inner LeaseStep when wrapper
types like ipPoolStep or clusterClaimStep sit between them; add a
SetProfileSecretName(string) method to ipPoolStep and clusterClaimStep that
simply forwards the call to their wrapped step (e.g., if setter, ok :=
s.wrapped.(interface{ SetProfileSecretName(string) }); ok {
setter.SetProfileSecretName(name) }) so the innermost step that actually
implements SetProfileSecretName receives the value and
multiStageTestStep.resolvedProfileSecretName is set (alternatively implement a
single helper to recursively unwrap and call the method, but prefer adding
forwarding methods on ipPoolStep and clusterClaimStep).
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: c1789de8-c232-4720-ab68-84729131a6f5

📥 Commits

Reviewing files that changed from the base of the PR and between 0a5f508 and 42dbc5f.

📒 Files selected for processing (4)
  • pkg/steps/lease.go
  • pkg/steps/lease_test.go
  • pkg/steps/multi_stage/multi_stage.go
  • pkg/steps/multi_stage/multi_stage_test.go

@deepsm007 deepsm007 force-pushed the fix-cluster-profile-secret-race branch from 42dbc5f to ef0c5e9 Compare April 2, 2026 20:29
Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@pkg/steps/multi_stage/multi_stage.go`:
- Around line 216-219: The fallback unconditionally returning s.name +
"-cluster-profile" can change the secret name used when SetProfileSecretName
isn't called; update the logic in the accessor (the code that currently checks
s.resolvedProfileSecretName) to preserve prior behavior by returning
s.profileSecretName if that field is non-empty, otherwise fall back to s.name +
"-cluster-profile"; keep the resolvedProfileSecretName check first, then
profileSecretName, then the name-based default so getProfileData() and
SetProfileSecretName remain compatible.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 4885c122-d01f-4465-bd24-9c7b78362dc1

📥 Commits

Reviewing files that changed from the base of the PR and between 42dbc5f and ef0c5e9.

📒 Files selected for processing (5)
  • pkg/steps/ip_pool.go
  • pkg/steps/lease.go
  • pkg/steps/lease_test.go
  • pkg/steps/multi_stage/multi_stage.go
  • pkg/steps/multi_stage/multi_stage_test.go
✅ Files skipped from review due to trivial changes (1)
  • pkg/steps/lease_test.go
🚧 Files skipped from review as they are similar to previous changes (1)
  • pkg/steps/multi_stage/multi_stage_test.go

Comment on lines +216 to +219
if s.resolvedProfileSecretName != "" {
return s.resolvedProfileSecretName
}
return name + "-cluster-profile"
return s.name + "-cluster-profile"
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Fallback secret-name derivation may regress non-forwarded paths.

Line 219 now always uses s.name + "-cluster-profile". If SetProfileSecretName is not reached for any execution path, getProfileData() will query a different secret name than before and can fail with secret-not-found.

Suggested compatibility-safe fallback
 func (s *multiStageTestStep) profileSecretName() string {
 	if s.resolvedProfileSecretName != "" {
 		return s.resolvedProfileSecretName
 	}
-	return s.name + "-cluster-profile"
+	baseName := s.name
+	if s.additionalSuffix != "" {
+		baseName = strings.TrimSuffix(baseName, "-"+s.additionalSuffix)
+	}
+	return baseName + "-cluster-profile"
 }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
if s.resolvedProfileSecretName != "" {
return s.resolvedProfileSecretName
}
return name + "-cluster-profile"
return s.name + "-cluster-profile"
if s.resolvedProfileSecretName != "" {
return s.resolvedProfileSecretName
}
baseName := s.name
if s.additionalSuffix != "" {
baseName = strings.TrimSuffix(baseName, "-"+s.additionalSuffix)
}
return baseName + "-cluster-profile"
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/steps/multi_stage/multi_stage.go` around lines 216 - 219, The fallback
unconditionally returning s.name + "-cluster-profile" can change the secret name
used when SetProfileSecretName isn't called; update the logic in the accessor
(the code that currently checks s.resolvedProfileSecretName) to preserve prior
behavior by returning s.profileSecretName if that field is non-empty, otherwise
fall back to s.name + "-cluster-profile"; keep the resolvedProfileSecretName
check first, then profileSecretName, then the name-based default so
getProfileData() and SetProfileSecretName remain compatible.

@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci bot commented Apr 2, 2026

@deepsm007: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/images ef0c5e9 link true /test images
ci/prow/breaking-changes ef0c5e9 link false /test breaking-changes

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@deepsm007
Copy link
Copy Markdown
Contributor Author

/test images e2e

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants