-
Notifications
You must be signed in to change notification settings - Fork 427
OCPBUGS-63698: fix(azure): remove CSI secret ownership from hypershift-operator #7157
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
OCPBUGS-63698: fix(azure): remove CSI secret ownership from hypershift-operator #7157
Conversation
|
@bryan-cox: This pull request references Jira Issue OCPBUGS-63698, which is invalid:
Comment The bug has been updated to refer to the pull request using the external bug tracker. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
Skipping CI for Draft Pull Request. |
WalkthroughRemoved disk and file CSI credential management from Azure platform reconciliation in hypershift-operator. These credentials are now handled by control-plane-operator instead. Corresponding test expectations were updated to reflect reduced credential entries and secret counts. Changes
Estimated code review effort🎯 2 (Simple) | ⏱️ ~12 minutes
✨ Finishing touches
🧪 Generate unit tests (beta)
📜 Recent review detailsConfiguration used: Organization UI Review profile: CHILL Plan: Pro Cache: Disabled due to data retention organization setting Knowledge base: Disabled due to 📒 Files selected for processing (2)
🧰 Additional context used📓 Path-based instructions (1)**⚙️ CodeRabbit configuration file
Files:
🔇 Additional comments (6)
Comment |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: bryan-cox The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Detailed Authentication Flow for Azure CSI Drivers on Self-Managed HyperShiftThis PR is part of a 3-PR solution. Here's the complete end-to-end authentication flow with exact code references: 1. HyperShift Operator - Removes CSI Secret OwnershipFile: Removes disk-csi and file-csi from the credential configs, leaving only ingress and image-registry. This prevents hypershift-operator from managing CSI secrets. 2. Control-Plane Operator - Registers CSI Secret AdaptersFile: WithManifestAdapter(
"azure-disk-csi-config.yaml",
component.WithAdaptFunction(adaptAzureCSIDiskSecret),
component.EnableForPlatform(hyperv1.AzurePlatform),
).
WithManifestAdapter(
"azure-file-csi-config.yaml",
component.WithAdaptFunction(adaptAzureCSIFileSecret),
component.EnableForPlatform(hyperv1.AzurePlatform),
)3. Control-Plane Operator - Creates cloud.conf JSONFile: func adaptAzureCSISecret(cpContext component.WorkloadContext, ...) error {
azureConfig := azure.AzureConfig{
Cloud: azureSpec.Cloud,
TenantID: azureSpec.TenantID,
SubscriptionID: azureSpec.SubscriptionID,
ResourceGroup: azureSpec.ResourceGroupName,
Location: azureSpec.Location,
}
if azureutil.IsSelfManagedAzure(cpContext.HCP.Spec.Platform.Type) {
azureConfig.UseFederatedWorkloadIdentityExtension = true // Line 34
azureConfig.AADClientID = string(workloadIdentity.ClientID) // Line 35
azureConfig.AADFederatedTokenFile = "/var/run/secrets/openshift/serviceaccount/token" // Line 36
}
// Get vnet info needed by CSI drivers
azureConfig.VnetName, azureConfig.VnetResourceGroup, _ = azureutil.GetVnetNameAndResourceGroupFromVnetID(azureSpec.VnetID)
serializedConfig, _ := json.MarshalIndent(azureConfig, "", " ")
secret.Data[azure.ConfigKey] = serializedConfig // ConfigKey = "cloud.conf"
}4. Cluster-Storage-Operator - Passes HYPERSHIFT_IMAGEFile: // For self-managed Azure (not ARO HCP), pass HYPERSHIFT_IMAGE to enable token-minter sidecars
if os.Getenv("ARO_HCP_SECRET_PROVIDER_CLASS_FOR_DISK") == "" && requiredCopy.Name == "azure-disk-csi-driver-operator" {
if hyperShiftImage := os.Getenv("HYPERSHIFT_IMAGE"); hyperShiftImage != "" {
envVars = append(envVars, corev1.EnvVar{Name: "HYPERSHIFT_IMAGE", Value: hyperShiftImage})
}
}5. CSI-Operator - Injects token-minter SidecarFile: func WithTokenMinter(serviceAccountName string) dc.DeploymentHookFunc {
return func(_ *opv1.OperatorSpec, deployment *appsv1.Deployment) error {
hyperShiftImage := os.Getenv("HYPERSHIFT_IMAGE")
if hyperShiftImage == "" {
return nil // Skip if not set
}
tokenMinter := corev1.Container{
Name: "token-minter",
Image: hyperShiftImage,
Command: []string{"/usr/bin/control-plane-operator", "token-minter"},
Args: []string{
"--service-account-namespace=openshift-cluster-csi-drivers",
"--service-account-name=" + serviceAccountName,
"--token-audience=openshift",
"--token-file=/var/run/secrets/openshift/serviceaccount/token",
"--kubeconfig=/etc/hosted-kubernetes/kubeconfig",
},
VolumeMounts: []corev1.VolumeMount{
{Name: "bound-sa-token", MountPath: "/var/run/secrets/openshift/serviceaccount"},
{Name: "hosted-kubeconfig", MountPath: "/etc/hosted-kubernetes", ReadOnly: true},
},
}
deployment.Spec.Template.Spec.Containers = append(deployment.Spec.Template.Spec.Containers, tokenMinter)
}
}6. CSI Driver Pod - Reads cloud.conf and Uses TokenThe resulting CSI driver controller pod has: containers:
- name: csi-driver
env:
- name: AZURE_CREDENTIAL_FILE
value: /etc/kubernetes/cloud.conf # Points to cloud.conf
volumeMounts:
- mountPath: /etc/kubernetes/
name: cloud-config # Mounts the secret with cloud.conf
- mountPath: /var/run/secrets/openshift/serviceaccount
name: bound-sa-token # Token created by token-minter
- name: token-minter
command: [/usr/bin/control-plane-operator, token-minter]
args:
- --service-account-namespace=openshift-cluster-csi-drivers
- --service-account-name=azure-disk-csi-driver-controller-sa
- --token-file=/var/run/secrets/openshift/serviceaccount/token
- --kubeconfig=/etc/hosted-kubernetes/kubeconfig
volumes:
- name: cloud-config
secret:
secretName: azure-disk-csi-config # Secret created by CPO with cloud.conf
- name: bound-sa-token
emptyDir:
medium: Memory # Shared volume for token7. Azure SDK - Authenticates Using Workload IdentityThe Azure SDK inside the CSI driver reads:
SummaryThis PR removes CSI secret management from hypershift-operator, allowing control-plane-operator to maintain full ownership with its existing, correct logic. The other two PRs enable the CSI driver operators to inject the token-minter sidecar that creates the tokens the CSI drivers need for authentication. Fixes: OCPBUGS-63698 |
fe8c89a to
1115939
Compare
|
/test all |
Remove disk-csi and file-csi credential management from hypershift-operator's ReconcileCredentials function. These secrets are now fully managed by control-plane-operator which creates them with the proper cloud.conf JSON configuration needed by the CSI drivers. This fixes an ownership conflict where hypershift-operator was replacing the entire secret.Data map, wiping out the cloud.conf field that control-plane-operator was adding. Fixes: OCPBUGS-63698 Signed-off-by: Bryan Cox <brcox@redhat.com> Commit-Message-Assisted-by: Claude (via Claude Code)
1115939 to
761ba8e
Compare
|
/test all |
|
@duanwei33: This PR was included in a payload test run from openshift/cluster-storage-operator#643
See details on https://pr-payload-tests.ci.openshift.org/runs/ci/7142dd10-d6fe-11f0-8d59-3afbbfee03e5-0 |
|
/auto-cc |
|
/lgtm |
|
The pre-merge testing was performed with openshift/cluster-storage-operator#643, openshift/csi-operator#461, and #7157. All regression test cases passed, with the exception of the azurefile-nfs snapshot. This failure is due to a known limitation in azurefile-nfs, and Jira OCPBUGS-69882 has been filed to track it. LGTM from the storage QE side. |
|
/retest |
|
@bryan-cox: The following test failed, say
Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
|
/retest-required |
|
/verified by @duanwei33 |
|
@bryan-cox: This PR has been marked as verified by DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
/jira refresh |
|
@bryan-cox: This pull request references Jira Issue OCPBUGS-63698, which is valid. 3 validation(s) were run on this bug
No GitHub users were found matching the public email listed for the QA contact in Jira (wduan@redhat.com), skipping review request. The bug has been updated to refer to the pull request using the external bug tracker. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
/retest |
|
/override "Red Hat Konflux / hypershift-operator-enterprise-contract / hypershift-operator-main" |
|
/override "Red Hat Konflux / hypershift-operator-main-enterprise-contract / hypershift-operator-main" |
|
@bryan-cox: Overrode contexts on behalf of bryan-cox: Red Hat Konflux / hypershift-operator-enterprise-contract / hypershift-operator-main DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
|
@bryan-cox: Overrode contexts on behalf of bryan-cox: Red Hat Konflux / hypershift-operator-main-enterprise-contract / hypershift-operator-main DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
941f016
into
openshift:main
|
@bryan-cox: Jira Issue OCPBUGS-63698: Some pull requests linked via external trackers have merged: The following pull request, linked via external tracker, has not merged:
All associated pull requests must be merged or unlinked from the Jira bug in order for it to move to the next state. Once unlinked, request a bug refresh with Jira Issue OCPBUGS-63698 has not been moved to the MODIFIED state. This PR is marked as verified. If the remaining PRs listed above are marked as verified before merging, the issue will automatically be moved to VERIFIED after all of the changes from the PRs are available in an accepted nightly payload. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
Summary
This PR enables Azure Disk and File CSI drivers to work correctly on self-managed Azure HyperShift clusters by fixing the credential secret ownership and ensuring proper authentication configuration.
Changes
1. Move CSI Secret Ownership to control-plane-operator
2. Authentication Flow
The proper authentication flow for self-managed Azure is:
useFederatedWorkloadIdentityExtension: true,aadClientId,aadFederatedTokenFileRelated PRs
Testing
Fixes: OCPBUGS-63698
Signed-off-by: Bryan Cox brcox@redhat.com