Skip to content

ci-operator: Implement 3-level naming hierarchy changes to CSI mechanism#4900

Open
psalajova wants to merge 2 commits intoopenshift:mainfrom
psalajova:ci-operator-include-groups-and-mapping-file
Open

ci-operator: Implement 3-level naming hierarchy changes to CSI mechanism#4900
psalajova wants to merge 2 commits intoopenshift:mainfrom
psalajova:ci-operator-include-groups-and-mapping-file

Conversation

@psalajova
Copy link
Contributor

@psalajova psalajova commented Jan 14, 2026

This PR implements support for the 3-level GSM secret naming structure (collection__group__field) in ci-operator, as per the design document.

Changes

3-level hierarchy

Secrets now use collection__group__field naming instead of collection__name:
Before:

credentials:
  - name: aws-creds
    collection: test-platform-infra
    mount_path: /tmp/aws

After:

credentials:
  - collection: test-platform-infra
    group: cluster-init
    field: aws-creds
    mount_path: /tmp/aws

Supported credential resolution modes

There are three ways one can reference credentials:

  1. Bundle reference (via gsm-config.yaml):
credentials:
  - bundle: aws-bundle
    mount_path: /tmp/aws
  1. Auto-discovery (lists all fields for collection+group):
credentials:
  - collection: test-platform-infra
    group: cluster-init
    mount_path: /tmp/secrets
  1. Explicit field:
credentials:
  - collection: test-platform-infra
    group: cluster-init
    field: aws-creds
    mount_path: /tmp/aws

Implementation details

  • Wired gsmConfig through main.go → defaults → multi_stage chain
  • Credentials are resolved once during createSPCs() before SecretProviderClass creation
  • Added validation to prevent file collisions when different groups share the same mount path
  • Censoring SPCs now track credentials by full secret name (collection__group__field)
  • Updated all tests to use 3-level structure
  • A separate PR needs to be done which will make sure ci-operator has gsm-config.yaml available at all times, as this is critical for bundle resolution.

Jira: https://issues.redhat.com/browse/DPTP-4656

@openshift-ci-robot
Copy link
Contributor

Pipeline controller notification
This repo is configured to use the pipeline controller. Second-stage tests will be triggered either automatically or after lgtm label is added, depending on the repository configuration. The pipeline controller will automatically detect which contexts are required and will utilize /test Prow commands to trigger the second stage.

For optional jobs, comment /test ? to see a list of all defined jobs. To trigger manually all jobs from second stage use /pipeline required command.

This repository is configured in: automatic mode

@coderabbitai
Copy link

coderabbitai bot commented Jan 14, 2026

Walkthrough

Adds Google Secret Manager (GSM) integration: moves GSM types into the public api package, extends credential references to bundle/group/field, threads GSM configuration and client through defaults and multi-stage steps, and implements GSM bundle resolution, auto-discovery, and related test updates.

Changes

Cohort / File(s) Summary
CLI: ci-operator
cmd/ci-operator/main.go
Added --gsm-config and --gsm-credentials-file flags, validate/load GSM config in Complete(), initialize GSM client and build a GSMConfiguration to pass into defaults.FromConfig.
CLI: secret-bootstrap
cmd/ci-secret-bootstrap/main.go, cmd/ci-secret-bootstrap/main_test.go
Migrated GSM-related types/usages from secretbootstrap to api; updated LoadGSMConfigFromFile calls, types (GSMConfig, GSMBundle, TargetSpec, FieldEntry, RegistryAuthData) and tests to use api types.
API: GSM types and deepcopy
pkg/api/gsm.go, pkg/api/gsm_test.go, pkg/api/types.go, pkg/api/zz_generated.deepcopy.go
Renamed Bundle→GSMBundle, moved package to api, adjusted GSMConfig.Bundles, added CredentialReference fields (Bundle/Group/Field) and predicates, and added deepcopy methods for new/updated types.
Defaults wiring
pkg/defaults/defaults.go, pkg/defaults/defaults_test.go
Extended FromConfig/fromConfig/stepForTest signatures to accept *multi_stage.GSMConfiguration and threaded gsmConfig through defaults; updated tests to match new parameters.
Multi-stage core & config
pkg/steps/multi_stage/multi_stage.go, pkg/steps/multi_stage/multi_stage_test.go, pkg/steps/multi_stage/gen_test.go, pkg/steps/multi_stage/run_test.go
Added GSMConfiguration type, added gsm *GSMConfiguration to multiStageTestStep, extended constructors to accept gsmConfig and added runtime validation for GSM client presence; updated test call sites.
Multi-stage GSM resolution
pkg/steps/multi_stage/gsm_bundle_resolver.go
New file: resolves CredentialReference entries (bundle refs, auto-discovery, explicit fields), memoizes discovery results, validates group collisions per mount path, and aggregates errors.
Multi-stage init & CSI utils
pkg/steps/multi_stage/init.go, pkg/steps/multi_stage/init_test.go, pkg/steps/multi_stage/csi_utils.go, pkg/steps/multi_stage/csi_utils_test.go
Pre-resolve credentials for all steps before SPC creation, deduplicate and censor by full GSM secret name (via gsm.GetGSMSecretName), rename helper to restoreForbiddenSymbolsInSecretName, switch SPC naming to use credential.Field, and update tests for new CredentialReference structure.
Secret bootstrap internals
cmd/ci-secret-bootstrap/... (see above)
Updated internal flows and function signatures to operate on api types for GSM-related processing and secret construction.
Docs / generated reference
pkg/webreg/zz_generated.ci_operator_reference.go
Updated embedded YAML/docs to include bundle/collection/group/field credential schema, comments, and examples aligned with the new credential model.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Warning

Review ran into problems

🔥 Problems

Git: Failed to clone repository. Please run the @coderabbitai full review command to re-trigger a full review. If the issue persists, set path_filters to include or exclude specific files.


Comment @coderabbitai help to get the list of available commands and usage tips.

@openshift-ci openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jan 14, 2026
@openshift-ci openshift-ci bot requested review from Prucek and hector-vido January 14, 2026 17:13
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Jan 14, 2026

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: psalajova

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jan 14, 2026
Comment on lines 112 to 115
gsmClient *secretmanager.Client
gsmCredentialsFile string
gsmConfig *api.GSMConfig
gsmProjectConfig gsm.Config
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it necessary to send all these params to the multistage step? Couldn't it just use the gsmClient that would be created in defaults maybe?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, you have a point, I don't like this amount of new parameters as well... I'll look into it, maybe I can simplify it

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've moved them all into a common struct

@psalajova psalajova force-pushed the ci-operator-include-groups-and-mapping-file branch 2 times, most recently from d3b02d3 to d796b98 Compare January 19, 2026 16:16
@psalajova psalajova changed the title [WIP] ci-operator: Implement 3-level naming hierarchy changes to CSI mechanism ci-operator: Implement 3-level naming hierarchy changes to CSI mechanism Jan 19, 2026
@openshift-ci openshift-ci bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jan 19, 2026
@psalajova
Copy link
Contributor Author

/hold

@openshift-ci openshift-ci bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jan 20, 2026
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Fix all issues with AI agents
In `@pkg/defaults/defaults.go`:
- Around line 171-197: When enableSecretsStoreCSIDriver is true, do not swallow
errors from gsm.GetConfigFromEnv or secretmanager.NewClient; instead return the
initialization error immediately so GSM is not left nil. In the block that
constructs gsmConfiguration (symbols: gsmConfiguration,
enableSecretsStoreCSIDriver, gsm.GetConfigFromEnv, secretmanager.NewClient),
replace the logrus.WithError(...).Error calls with returning a wrapped error
(e.g. fmt.Errorf("gsm init: %w", err)) from the surrounding function; ensure the
function signature and callers are updated to propagate this error instead of
assuming success. Also keep assigning gsmConfiguration.ProjectConfig and
gsmConfiguration.Client only on success.

In `@pkg/steps/multi_stage/multi_stage.go`:
- Around line 225-231: Remove the per-step deferred close of the shared GSM
client: inside the run method where you check s.enableSecretsStoreCSIDriver and
s.gsm (and currently call defer s.gsm.Client.Close()), delete that defer so the
shared s.gsm.Client is not closed by each step (this prevents breaking later
steps and race conditions); ensure the GSM client lifecycle is managed centrally
(closed once at job shutdown) and leave the rest of the logic (the nil checks
and call to s.createSPCs) unchanged.
🧹 Nitpick comments (3)
cmd/ci-secret-generator/main_test.go (1)

581-586: Consider adding a dedicated field for disabled clusters instead of matching test name.

Using tc.name string matching to determine test behavior is fragile. If the test name changes, this conditional silently stops working correctly.

♻️ Suggested improvement

Add a disabledClusters field to the test case struct:

 testCases := []struct {
     name               string
     config             secretgenerator.Config
     GSMsyncEnabled     bool
     expectedIndexCalls int
     verifyIndexPayload func(t *testing.T, itemName string, payload []byte)
+    disabledClusters   sets.Set[string]
 }{

Then use tc.disabledClusters directly instead of the name-based conditional.

pkg/steps/multi_stage/gsm_bundle_resolver.go (1)

46-54: Non-deterministic group ordering in error message.

The groupList slice is populated by iterating over a map, which has non-deterministic order. This can cause the error message to vary between runs, making debugging and test assertions harder.

♻️ Suggested fix
+import "sort"
+
 for key, groups := range mountPathGroups {
     if len(groups) > 1 {
         var groupList []string
         for group := range groups {
             groupList = append(groupList, group)
         }
+        sort.Strings(groupList)
         return fmt.Errorf("multiple groups (%v) found for collection=%s, mount_path=%s - different groups in the same collection must use different mount paths to avoid file name collisions",
             groupList, key.collection, key.mountPath)
     }
 }
pkg/gsm-secrets/secrets.go (1)

87-89: Consider creating a defensive copy of the input slice.

The function appends to and sorts the input secretsList in place, which mutates the caller's slice. While current usage passes an empty slice, this could cause subtle bugs if callers expect their input to remain unchanged.

♻️ Suggested defensive copy
 func ConstructIndexSecretContent(secretsList []string) []byte {
-	secretsList = append(secretsList, UpdaterSASecretName)
-	sort.Strings(secretsList)
+	result := make([]string, len(secretsList), len(secretsList)+1)
+	copy(result, secretsList)
+	result = append(result, UpdaterSASecretName)
+	sort.Strings(result)

 	var formattedSecrets []string
-	for _, secret := range secretsList {
+	for _, secret := range result {
 		formattedSecrets = append(formattedSecrets, fmt.Sprintf("- %s", secret))
 	}

 	return []byte(strings.Join(formattedSecrets, "\n"))
 }

@psalajova psalajova force-pushed the ci-operator-include-groups-and-mapping-file branch from 05cd174 to 11ea941 Compare February 3, 2026 14:08
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
pkg/steps/multi_stage/csi_utils.go (1)

77-95: ⚠️ Potential issue | 🟠 Major

Include Group field in the hash to prevent potential SPC name collisions.

The function hashes only credential.Field, ignoring the Group field which is part of the credential's identity in the collection__group__field hierarchy. Two credentials with identical collection, mountPath, and Field but different Group values would produce the same SPC name, even though they reference different GSM secrets. Although such credentials would be grouped together for the same mount path, the current test coverage (TestGetSPCNameCollisionPrevention) does not verify Group differentiation. Adding Group to the hash input would eliminate this collision risk.

pkg/steps/multi_stage/multi_stage.go (1)

503-541: ⚠️ Potential issue | 🔴 Critical

Use the same deduplication key as init.go to match full credential identity.

The code deduplicates credentials using only credential.Name (line 508), but init.go creates SPCs using the full credential identifier from gsm.GetGSMSecretName(credential.Collection, credential.Group, credential.Field). This inconsistency can cause mismatches: if two credentials share the same Name but differ in Collection, Group, or Field, the second one would be incorrectly skipped here even though init.go would create separate SPCs for both.

Align the deduplication key with init.go by using the full credential identity (collection + group + field) instead of just Name.

🤖 Fix all issues with AI agents
In `@cmd/ci-operator/main.go`:
- Around line 761-766: The startup attempts to load GSM config unconditionally
when o.enableSecretsStoreCSIDriver is true, but o.gsmConfigPath defaults to
empty which causes a file read error; in Complete() validate that if
enableSecretsStoreCSIDriver (o.enableSecretsStoreCSIDriver) is true then
gsmConfigPath (o.gsmConfigPath) must be non-empty and return a clear error if
missing, or alternatively make GSM loading conditional only when gsmConfigPath
is set (i.e., check o.gsmConfigPath != "" before calling
api.LoadGSMConfigFromFile), updating the code paths that call
LoadGSMConfigFromFile to use this validation and preserving existing behavior in
the main startup block that loads the config.

In `@pkg/defaults/defaults.go`:
- Around line 128-149: The GSM secretmanager.Client created by
secretmanager.NewClient in FromConfig (variable gsmClient) is never closed;
immediately after successfully creating gsmClient (right after the
secretmanager.NewClient call) add a defer gsmClient.Close() so the gRPC
connections and background resources are released when FromConfig returns;
ensure this cleanup is placed before assigning gsmClient into gsmConfig to
guarantee the client is closed if FromConfig exits early due to an error.

In `@pkg/steps/multi_stage/init.go`:
- Around line 91-115: The resolved credentials from ResolveCredentialReferences
are only written into the local allSteps copy, so s.pre/s.test/s.post remain
unchanged and later addCredentialsToCensoring reads stale credentials causing
SPC name mismatches; update the real slices after resolving by writing
resolvedCredentials back into the corresponding original step entries (s.pre,
s.test, s.post) or refactor to only use the single resolved allSteps for
subsequent operations (createSPCs, addCredentialsToCensoring,
groupCredentialsByCollectionAndMountPath) so SPC creation and censoring
reference the same credential objects.
🧹 Nitpick comments (1)
pkg/steps/multi_stage/gsm_bundle_resolver.go (1)

88-159: Consider simplifying error handling - aggregate is unnecessary with fail-fast behavior.

The code uses break after each error (lines 92, 97, 103, 118, 130, 153), meaning at most one error is ever collected. Using utilerrors.NewAggregate is unnecessary overhead when the slice will never contain multiple errors.

Either:

  1. Remove the aggregate and return the single error directly, or
  2. If multiple errors should be collected for better diagnostics, use continue instead of break

The current behavior (fail-fast) is reasonable for credential resolution, so option 1 aligns better with the intent.

♻️ Simplify to direct error return
 	for _, cred := range credentials {
 		if cred.IsBundleReference() {
 			if gsmConfig == nil {
-				errs = append(errs, fmt.Errorf("bundle reference %q requires gsm-config file, but config is not loaded", cred.Bundle))
-				break
+				return nil, fmt.Errorf("bundle reference %q requires gsm-config file, but config is not loaded", cred.Bundle)
 			}
 			// ... similar pattern for other errors

@openshift-ci-robot
Copy link
Contributor

Scheduling required tests:
/test e2e

@psalajova psalajova force-pushed the ci-operator-include-groups-and-mapping-file branch from 11ea941 to 91b098b Compare February 9, 2026 16:39
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In `@pkg/steps/multi_stage/csi_utils.go`:
- Around line 82-88: getSPCName currently builds the SPC hash using collection,
mountPath and credential.Field which can collide when two credentials share
those values but have different credential.Group; update getSPCName to include
credential.Group in the hashed inputs (e.g., add credential.Group to the
per-credential components you collect before sorting/concatenating) so the parts
used for hashing include group and avoid SPC name collisions that cause wrong
secret mounts.
🧹 Nitpick comments (1)
pkg/api/types.go (1)

1134-1151: Clarify legacy Name usage to avoid config ambiguity.
Now that Collection/Group/Field (and Bundle) are first-class, keeping Name required and documented as the primary field is confusing. Consider marking it optional and calling it legacy in the comment.

Suggested refactor
-	// Name is the name of the secret, without the collection prefix.
-	Name string `json:"name"`
+	// Name is the legacy secret name (pre collection/group/field). Prefer Bundle/Collection/Group/Field.
+	Name string `json:"name,omitempty"`

Comment on lines +82 to +88
// Sort credential field names for deterministic hashing
var credFields []string
for _, cred := range credentials {
credNames = append(credNames, cred.Name)
credFields = append(credFields, cred.Field)
}
sort.Strings(credNames)
parts = append(parts, credNames...)
sort.Strings(credFields)
parts = append(parts, credFields...)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Include group in the SPC hash to avoid cross-group collisions.

getSPCName hashes only credential.Field; if two credentials share collection+mountPath+field but differ by group, the SPC name collides and later creation overwrites the earlier SPC. That can mount the wrong secret for a step.

🛠️ Proposed fix
-	// Sort credential field names for deterministic hashing
-	var credFields []string
-	for _, cred := range credentials {
-		credFields = append(credFields, cred.Field)
-	}
-	sort.Strings(credFields)
-	parts = append(parts, credFields...)
+	// Sort full GSM secret names for deterministic hashing
+	var credSecretNames []string
+	for _, cred := range credentials {
+		credSecretNames = append(credSecretNames, gsm.GetGSMSecretName(cred.Collection, cred.Group, cred.Field))
+	}
+	sort.Strings(credSecretNames)
+	parts = append(parts, credSecretNames...)
🤖 Prompt for AI Agents
In `@pkg/steps/multi_stage/csi_utils.go` around lines 82 - 88, getSPCName
currently builds the SPC hash using collection, mountPath and credential.Field
which can collide when two credentials share those values but have different
credential.Group; update getSPCName to include credential.Group in the hashed
inputs (e.g., add credential.Group to the per-credential components you collect
before sorting/concatenating) so the parts used for hashing include group and
avoid SPC name collisions that cause wrong secret mounts.

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Feb 9, 2026

@psalajova: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/e2e 11ea941 link true /test e2e
ci/prow/format 91b098b link true /test format
ci/prow/frontend-checks 91b098b link true /test frontend-checks
ci/prow/breaking-changes 91b098b link false /test breaking-changes
ci/prow/unit 91b098b link true /test unit
ci/prow/checkconfig 91b098b link true /test checkconfig
ci/prow/integration 91b098b link true /test integration
ci/prow/security 91b098b link false /test security
ci/prow/images 91b098b link true /test images
ci/prow/validate-vendor 91b098b link true /test validate-vendor
ci/prow/codegen 91b098b link true /test codegen

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants