feat(rcv1p): unify cert bootstrap flow and add Windows CA refresh task#8096
feat(rcv1p): unify cert bootstrap flow and add Windows CA refresh task#8096
Conversation
There was a problem hiding this comment.
Pull request overview
This PR aims to unify the custom-cloud CA certificate bootstrap path (removing the separate “operation-requests” init scripts) and adds a Windows scheduled task to periodically refresh custom-cloud CA certificates.
Changes:
- Windows: add a scheduled task to refresh custom-cloud CA certificates; update
Get-CACertificatesto support legacy vs “rcv1p” modes keyed off location. - Linux: consolidate custom-cloud init to a single init script and update CSE command generation to set a cert-endpoint mode variable.
- Regenerate multiple custom data / generated command snapshots to reflect the new templates.
Reviewed changes
Copilot reviewed 74 out of 176 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| staging/cse/windows/kubernetesfunc.ps1 | Adds CA refresh scheduled task + updates CA retrieval logic and error behavior |
| parts/windows/kuberneteswindowssetup.ps1 | Wires Get-CACertificates -Location and registers refresh task for custom clouds |
| pkg/agent/variables.go | Always injects initAKSCustomCloud payload into cloud-init data |
| pkg/agent/const.go | Removes separate custom-cloud init script constants; keeps single init script |
| pkg/agent/baker.go | Simplifies GetTargetEnvironment; notes IsAKSCustomCloud as deprecated |
| parts/linux/cloud-init/artifacts/cse_cmd.sh | Updates CSE command to set cert endpoint mode + run custom-cloud init script |
| parts/linux/cloud-init/artifacts/init-aks-custom-cloud-operation-requests.sh | Deleted (custom-cloud init consolidation) |
| parts/linux/cloud-init/artifacts/init-aks-custom-cloud-operation-requests-mariner.sh | Deleted (custom-cloud init consolidation) |
| parts/linux/cloud-init/artifacts/init-aks-custom-cloud-mariner.sh | Deleted (custom-cloud init consolidation) |
| aks-node-controller/parser/templates/cse_cmd.sh.gtpl | Mirrors CSE command template updates for aks-node-controller parser |
| aks-node-controller/parser/testdata/Compatibility+EmptyConfig/generatedCSECommand | Regenerated snapshot for new CSE cmd template |
| aks-node-controller/parser/testdata/AzureLinuxv2+Kata+DisableUnattendedUpgrades=false/generatedCSECommand | Regenerated snapshot for new CSE cmd template |
| aks-node-controller/parser/testdata/AKSUbuntu2204+SSHStatusOn/generatedCSECommand | Regenerated snapshot for new CSE cmd template |
| aks-node-controller/parser/testdata/AKSUbuntu2204+EnablePubkeyAuth/generatedCSECommand | New snapshot for new template output |
| aks-node-controller/parser/testdata/AKSUbuntu2204+DisablePubkeyAuth/generatedCSECommand | New snapshot for new template output |
| aks-node-controller/parser/testdata/AKSUbuntu2204+DefaultPubkeyAuth/generatedCSECommand | New snapshot for new template output |
| aks-node-controller/parser/testdata/AKSUbuntu2204+CustomOSConfig/generatedCSECommand | Regenerated snapshot for new CSE cmd template |
| aks-node-controller/parser/testdata/AKSUbuntu2204+CustomCloud/generatedCSECommand | Regenerated snapshot for new CSE cmd template |
| aks-node-controller/parser/testdata/AKSUbuntu2204+Containerd+MIG/generatedCSECommand | Regenerated snapshot for new CSE cmd template |
| aks-node-controller/parser/testdata/AKSUbuntu2204+CloudProviderOverrides/generatedCSECommand | New snapshot for new template output |
| aks-node-controller/parser/testdata/AKSUbuntu2204+China/generatedCSECommand | Regenerated snapshot for new CSE cmd template |
| pkg/agent/testdata/MarinerV2+Kata/CustomData | Regenerated snapshot (custom data gzip payload changed) |
| pkg/agent/testdata/AzureLinuxV2+Kata/CustomData | Regenerated snapshot (custom data gzip payload changed) |
| pkg/agent/testdata/AzureLinuxV3+Kata/CustomData | Regenerated snapshot (custom data gzip payload changed) |
| pkg/agent/testdata/AKSUbuntu2204+China/CustomData | Regenerated snapshot (custom data gzip payload changed) |
| pkg/agent/testdata/AKSUbuntu2204+cgroupv2/CustomData | Regenerated snapshot (custom data gzip payload changed) |
| pkg/agent/testdata/AKSUbuntu2204+ootcredentialprovider/CustomData | Regenerated snapshot (custom data gzip payload changed) |
| pkg/agent/testdata/AKSUbuntu2204+SecurityProfile/CustomData | Regenerated snapshot (custom data gzip payload changed) |
| pkg/agent/testdata/AKSUbuntu2204+SSHStatusOn/CustomData | Regenerated snapshot (custom data gzip payload changed) |
| pkg/agent/testdata/AKSUbuntu2204+SSHStatusOff/CustomData | Regenerated snapshot (custom data gzip payload changed) |
| pkg/agent/testdata/AKSUbuntu2404+NetworkPolicy/CustomData | Regenerated snapshot (custom data gzip payload changed) |
| pkg/agent/testdata/AKSUbuntu2404+Teleport/CustomData | Regenerated snapshot (custom data gzip payload changed) |
| pkg/agent/testdata/CustomizedImage/CustomData | Regenerated snapshot (custom data gzip payload changed) |
| pkg/agent/testdata/CustomizedImageKata/CustomData | Regenerated snapshot (custom data gzip payload changed) |
| pkg/agent/testdata/CustomizedImageLinuxGuard/CustomData | Regenerated snapshot (custom data gzip payload changed) |
| pkg/agent/testdata/Flatcar/CustomData.inner | Regenerated snapshot (embedded gzip payload changed) |
| pkg/agent/testdata/ACL/CustomData.inner | Regenerated snapshot (embedded gzip payload changed) |
You can also share your feedback on Copilot code review. Take the survey.
| try { | ||
| if ($certEndpointMode -eq "legacy") { | ||
| $uri = 'http://168.63.129.16/machine?comp=acmspackage&type=cacertificates&ext=json' | ||
| $rawData = Retry-Command -Command 'Invoke-WebRequest' -Args @{Uri=$uri; UseBasicParsing=$true} -Retries 5 -RetryDelaySeconds 10 | ||
| } catch { | ||
| Set-ExitCode -ExitCode $global:WINDOWS_CSE_ERROR_DOWNLOAD_CA_CERTIFICATES -ErrorMessage "Failed to download CA certificates rawdata. Error: $_" | ||
| $caCerts = ($rawData.Content) | ConvertFrom-Json | ||
| if ($null -eq $caCerts -or $null -eq $caCerts.Certificates -or $caCerts.Certificates.Length -eq 0) { | ||
| Write-Log "Warning: CA certificates rawdata is empty for legacy endpoint" | ||
| return $false | ||
| } | ||
|
|
||
| foreach ($certificate in $caCerts.Certificates) { | ||
| $name = $certificate.Name | ||
| $certFilePath = Join-Path $caFolder $name | ||
| Write-Log "Write certificate $name to $certFilePath" | ||
| $certificate.CertBody > $certFilePath | ||
| } | ||
|
|
||
| return $true | ||
| } | ||
|
|
||
| Write-Log "Convert CA certificates rawdata" | ||
| $caCerts=($rawData.Content) | ConvertFrom-Json | ||
| if ([string]::IsNullOrEmpty($caCerts)) { | ||
| Set-ExitCode -ExitCode $global:WINDOWS_CSE_ERROR_EMPTY_CA_CERTIFICATES -ErrorMessage "CA certificates rawdata is empty" | ||
| $optInUri = 'http://168.63.129.16/acms/isOptedInForRootCerts' | ||
| $optInResponse = Retry-Command -Command 'Invoke-WebRequest' -Args @{Uri=$optInUri; UseBasicParsing=$true} -Retries 5 -RetryDelaySeconds 10 | ||
| if (($optInResponse.Content -notmatch 'IsOptedInForRootCerts=true')) { | ||
| Write-Log "Skipping custom cloud root cert installation because IsOptedInForRootCerts is not true" | ||
| return $false | ||
| } | ||
|
|
||
| $certificates = $caCerts.Certificates | ||
| for ($index = 0; $index -lt $certificates.Length ; $index++) { | ||
| $name=$certificates[$index].Name | ||
| $certFilePath = Join-Path $caFolder $name | ||
| Write-Log "Write certificate $name to $certFilePath" | ||
| $certificates[$index].CertBody > $certFilePath | ||
| $operationRequestTypes = @("operationrequestsroot", "operationrequestsintermediate") | ||
| $downloadedAny = $false | ||
|
|
||
| foreach ($requestType in $operationRequestTypes) { | ||
| $operationRequestUri = "http://168.63.129.16/machine?comp=acmspackage&type=$requestType&ext=json" | ||
| $operationResponse = Retry-Command -Command 'Invoke-WebRequest' -Args @{Uri=$operationRequestUri; UseBasicParsing=$true} -Retries 5 -RetryDelaySeconds 10 | ||
| $operationJson = ($operationResponse.Content) | ConvertFrom-Json | ||
|
|
||
| if ($null -eq $operationJson -or $null -eq $operationJson.OperationRequests) { | ||
| Write-Log "Warning: no operation requests found for $requestType" | ||
| continue | ||
| } | ||
|
|
||
| foreach ($operation in $operationJson.OperationRequests) { | ||
| $resourceFileName = $operation.ResouceFileName | ||
| if ([string]::IsNullOrEmpty($resourceFileName)) { | ||
| continue | ||
| } | ||
|
|
||
| $resourceType = [IO.Path]::GetFileNameWithoutExtension($resourceFileName) | ||
| $resourceExt = [IO.Path]::GetExtension($resourceFileName).TrimStart('.') | ||
| $resourceUri = "http://168.63.129.16/machine?comp=acmspackage&type=$resourceType&ext=$resourceExt" | ||
|
|
||
| $certContentResponse = Retry-Command -Command 'Invoke-WebRequest' -Args @{Uri=$resourceUri; UseBasicParsing=$true} -Retries 5 -RetryDelaySeconds 10 | ||
| if ([string]::IsNullOrEmpty($certContentResponse.Content)) { | ||
| Write-Log "Warning: empty certificate content for $resourceFileName" | ||
| continue | ||
| } | ||
|
|
||
| $certFilePath = Join-Path $caFolder $resourceFileName | ||
| Write-Log "Write certificate $resourceFileName to $certFilePath" | ||
| $certContentResponse.Content > $certFilePath | ||
| $downloadedAny = $true | ||
| } | ||
| } | ||
|
|
||
| if (-not $downloadedAny) { | ||
| Write-Log "Warning: no CA certificates were downloaded in rcv1p mode" | ||
| } | ||
|
|
||
| return $downloadedAny | ||
| } | ||
| catch { | ||
| # Catch all exceptions in this function. NOTE: exit cannot be caught. | ||
| Set-ExitCode -ExitCode $global:WINDOWS_CSE_ERROR_GET_CA_CERTIFICATES -ErrorMessage $_ | ||
| Write-Log "Warning: failed to retrieve CA certificates. Error: $_" | ||
| return $false |
44ff9ee to
a0a1307
Compare
There was a problem hiding this comment.
Pull request overview
This PR aims to unify AKS custom-cloud CA certificate bootstrap behavior (legacy vs “rcv1p/operation-requests” style flows) and adds a Windows scheduled task to periodically refresh custom-cloud CA certificates.
Changes:
- Adds Windows CA refresh scheduled task registration and introduces location-based endpoint-mode selection (legacy vs rcv1p).
- Refactors Windows CA certificate retrieval to support both endpoint modes and opt-in gating for rcv1p.
- Simplifies Linux custom-cloud init script selection by consolidating onto
init-aks-custom-cloud.shand removing older variants; updates generated testdata accordingly.
Reviewed changes
Copilot reviewed 93 out of 99 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| staging/cse/windows/kubernetesfunc.ps1 | Adds CA refresh scheduled task and endpoint-mode-aware Get-CACertificates implementation. |
| pkg/agent/variables.go | Simplifies how initAKSCustomCloud is added to Linux cloud-init variables. |
| pkg/agent/testdata/MarinerV2+Kata/CustomData | Updates expected CustomData snapshot (generated content changed). |
| pkg/agent/testdata/Flatcar/CustomData.inner | Updates expected Flatcar CustomData snapshot (generated content changed). |
| pkg/agent/testdata/CustomizedImageLinuxGuard/CustomData | Updates expected CustomData snapshot (generated content changed). |
| pkg/agent/testdata/CustomizedImageKata/CustomData | Updates expected CustomData snapshot (generated content changed). |
| pkg/agent/testdata/CustomizedImage/CustomData | Updates expected CustomData snapshot (generated content changed). |
| pkg/agent/testdata/AzureLinuxV3+Kata/CustomData | Updates expected CustomData snapshot (generated content changed). |
| pkg/agent/testdata/AzureLinuxV2+Kata/CustomData | Updates expected CustomData snapshot (generated content changed). |
| pkg/agent/testdata/AKSWindows23H2Gen2+NextGenNetworkingNoConfig/CustomData | Updates expected Windows CustomData snapshot (calls/refresh task additions). |
| pkg/agent/testdata/AKSWindows23H2Gen2+NextGenNetworkingDisabled/CustomData | Updates expected Windows CustomData snapshot (calls/refresh task additions). |
| pkg/agent/testdata/AKSWindows23H2Gen2+NextGenNetworking/CustomData | Updates expected Windows CustomData snapshot (calls/refresh task additions). |
| pkg/agent/testdata/AKSWindows2019+ootcredentialprovider/CustomData | Updates expected Windows CustomData snapshot (calls/refresh task additions). |
| pkg/agent/testdata/AKSWindows2019+SecurityProfile/CustomData | Updates expected Windows CustomData snapshot (calls/refresh task additions). |
| pkg/agent/testdata/AKSWindows2019+ManagedIdentity/CustomData | Updates expected Windows CustomData snapshot (calls/refresh task additions). |
| pkg/agent/testdata/AKSWindows2019+KubeletServingCertificateRotation/CustomData | Updates expected Windows CustomData snapshot (calls/refresh task additions). |
| pkg/agent/testdata/AKSWindows2019+KubeletClientTLSBootstrapping/CustomData | Updates expected Windows CustomData snapshot (calls/refresh task additions). |
| pkg/agent/testdata/AKSWindows2019+K8S119/CustomData | Updates expected Windows CustomData snapshot (calls/refresh task additions). |
| pkg/agent/testdata/AKSWindows2019+K8S119+FIPS/CustomData | Updates expected Windows CustomData snapshot (calls/refresh task additions). |
| pkg/agent/testdata/AKSWindows2019+K8S119+CSI/CustomData | Updates expected Windows CustomData snapshot (calls/refresh task additions). |
| pkg/agent/testdata/AKSWindows2019+K8S118/CustomData | Updates expected Windows CustomData snapshot (calls/refresh task additions). |
| pkg/agent/testdata/AKSWindows2019+K8S117/CustomData | Updates expected Windows CustomData snapshot (calls/refresh task additions). |
| pkg/agent/testdata/AKSWindows2019+K8S116/CustomData | Updates expected Windows CustomData snapshot (calls/refresh task additions). |
| pkg/agent/testdata/AKSWindows2019+EnablePrivateClusterHostsConfigAgent/CustomData | Updates expected Windows CustomData snapshot (calls/refresh task additions). |
| pkg/agent/testdata/AKSWindows2019+CustomVnet/CustomData | Updates expected Windows CustomData snapshot (calls/refresh task additions). |
| pkg/agent/testdata/AKSWindows2019+CustomCloud/CustomData | Updates expected Windows CustomData snapshot (new Get-CACertificates call form + refresh task). |
| pkg/agent/testdata/AKSWindows2019+CustomCloud+ootcredentialprovider/CustomData | Updates expected Windows CustomData snapshot (new Get-CACertificates call form + refresh task). |
| pkg/agent/testdata/AKSUbuntu2404+Teleport/CustomData | Updates expected Ubuntu CustomData snapshot (generated content changed). |
| pkg/agent/testdata/AKSUbuntu2404+NetworkPolicy/CustomData | Updates expected Ubuntu CustomData snapshot (generated content changed). |
| pkg/agent/testdata/AKSUbuntu2204+ootcredentialprovider/CustomData | Updates expected Ubuntu CustomData snapshot (generated content changed). |
| pkg/agent/testdata/AKSUbuntu2204+cgroupv2/CustomData | Updates expected Ubuntu CustomData snapshot (generated content changed). |
| pkg/agent/testdata/AKSUbuntu2204+SecurityProfile/CustomData | Updates expected Ubuntu CustomData snapshot (generated content changed). |
| pkg/agent/testdata/AKSUbuntu2204+SSHStatusOn/CustomData | Updates expected Ubuntu CustomData snapshot (generated content changed). |
| pkg/agent/testdata/AKSUbuntu2204+China/CustomData | Updates expected Ubuntu CustomData snapshot (generated content changed). |
| pkg/agent/testdata/ACL/CustomData.inner | Updates expected ACL CustomData snapshot (generated content changed). |
| pkg/agent/const.go | Consolidates custom-cloud init script constants to a single script. |
| parts/windows/kuberneteswindowssetup.ps1 | Updates Windows setup flow to call Get-CACertificates with location and registers CA refresh scheduled task. |
| parts/linux/cloud-init/artifacts/init-aks-custom-cloud-operation-requests.sh | Removes operation-requests-specific Linux init script (consolidation). |
| parts/linux/cloud-init/artifacts/init-aks-custom-cloud-operation-requests-mariner.sh | Removes Mariner/AzureLinux operation-requests init script (consolidation). |
| parts/linux/cloud-init/artifacts/init-aks-custom-cloud-mariner.sh | Removes Mariner/AzureLinux legacy init script variant (consolidation). |
| aks-node-controller/parser/templates/cse_cmd.sh.gtpl | Adds a LOCATION shell variable in the generated CSE command template. |
| aks-node-controller/parser/helper.go | Factors out a shared getCloudLocation helper and reuses it in getCloudTargetEnv. |
You can also share your feedback on Copilot code review. Take the survey.
| Get-CACertificates | ||
| {{end}} | ||
|
|
||
| Get-CACertificates -Location $Location |
| catch { | ||
| # Catch all exceptions in this function. NOTE: exit cannot be caught. | ||
| Set-ExitCode -ExitCode $global:WINDOWS_CSE_ERROR_GET_CA_CERTIFICATES -ErrorMessage $_ | ||
| Write-Log "Warning: failed to retrieve CA certificates. Error: $_" | ||
| return $false |
2b3c1d6 to
e19a19b
Compare
e19a19b to
d41856f
Compare
There was a problem hiding this comment.
Pull request overview
This PR unifies the AKS custom cloud CA certificate bootstrap logic to a single flow and adds a Windows scheduled task to periodically refresh custom cloud CA certificates. It also updates Linux/customdata generation and test snapshots to reflect the new wiring.
Changes:
- Add Windows scheduled task registration for daily CA certificate refresh and introduce a location-based cert endpoint mode selector.
- Simplify Linux custom cloud init script selection by standardizing on
init-aks-custom-cloud.sh, plus add wiring/tests for refresh-mode arguments. - Update aks-node-controller template to export
LOCATION, and regenerate CustomData snapshot test artifacts.
Reviewed changes
Copilot reviewed 95 out of 101 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| staging/cse/windows/kubernetesfunc.tests.ps1 | Adds Pester coverage for cert endpoint mode selection, scheduled task registration, and CA retrieval behavior. |
| staging/cse/windows/kubernetesfunc.ps1 | Implements unified Windows CA retrieval logic with legacy/rcv1p modes and registers a daily refresh scheduled task. |
| spec/parts/linux/cloud-init/artifacts/init_aks_custom_cloud_spec.sh | Adds ShellSpec assertions to validate refresh-mode argument parsing/wiring in the Linux init script. |
| pkg/agent/variables.go | Changes how initAKSCustomCloud is injected into Linux cloud-init data. |
| pkg/agent/const.go | Removes per-cloud custom init script constants and standardizes on init-aks-custom-cloud.sh. |
| parts/windows/kuberneteswindowssetup.ps1 | Wires CA retrieval call and registers the Windows CA refresh scheduled task during BasePrep. |
| parts/linux/cloud-init/artifacts/init-aks-custom-cloud-operation-requests.sh | Removed (operation-requests variant no longer used). |
| parts/linux/cloud-init/artifacts/init-aks-custom-cloud-operation-requests-mariner.sh | Removed (operation-requests Mariner/AzureLinux variant no longer used). |
| parts/linux/cloud-init/artifacts/init-aks-custom-cloud-mariner.sh | Removed (Mariner/AzureLinux legacy variant no longer used). |
| aks-node-controller/parser/templates/cse_cmd.sh.gtpl | Exports LOCATION into the CSE environment for downstream scripts. |
| aks-node-controller/parser/helper.go | Adds a helper to normalize location and reuses it in cloud target env detection. |
| pkg/agent/testdata/MarinerV2+Kata/CustomData | Regenerated CustomData snapshot due to init/custom cloud wiring changes. |
| pkg/agent/testdata/Flatcar/CustomData.inner | Regenerated CustomData snapshot due to init/custom cloud wiring changes. |
| pkg/agent/testdata/CustomizedImageLinuxGuard/CustomData | Regenerated CustomData snapshot due to init/custom cloud wiring changes. |
| pkg/agent/testdata/CustomizedImageKata/CustomData | Regenerated CustomData snapshot due to init/custom cloud wiring changes. |
| pkg/agent/testdata/CustomizedImage/CustomData | Regenerated CustomData snapshot due to init/custom cloud wiring changes. |
| pkg/agent/testdata/AzureLinuxV3+Kata/CustomData | Regenerated CustomData snapshot due to init/custom cloud wiring changes. |
| pkg/agent/testdata/AzureLinuxV2+Kata/CustomData | Regenerated CustomData snapshot due to init/custom cloud wiring changes. |
| pkg/agent/testdata/AKSWindows23H2Gen2+NextGenNetworkingNoConfig/CustomData | Regenerated CustomData snapshot due to Windows CA refresh task wiring. |
| pkg/agent/testdata/AKSWindows23H2Gen2+NextGenNetworkingDisabled/CustomData | Regenerated CustomData snapshot due to Windows CA refresh task wiring. |
| pkg/agent/testdata/AKSWindows23H2Gen2+NextGenNetworking/CustomData | Regenerated CustomData snapshot due to Windows CA refresh task wiring. |
| pkg/agent/testdata/AKSWindows2019+ootcredentialprovider/CustomData | Regenerated CustomData snapshot due to Windows CA refresh task wiring. |
| pkg/agent/testdata/AKSWindows2019+SecurityProfile/CustomData | Regenerated CustomData snapshot due to Windows CA refresh task wiring. |
| pkg/agent/testdata/AKSWindows2019+ManagedIdentity/CustomData | Regenerated CustomData snapshot due to Windows CA refresh task wiring. |
| pkg/agent/testdata/AKSWindows2019+KubeletServingCertificateRotation/CustomData | Regenerated CustomData snapshot due to Windows CA refresh task wiring. |
| pkg/agent/testdata/AKSWindows2019+KubeletClientTLSBootstrapping/CustomData | Regenerated CustomData snapshot due to Windows CA refresh task wiring. |
| pkg/agent/testdata/AKSWindows2019+K8S119/CustomData | Regenerated CustomData snapshot due to Windows CA refresh task wiring. |
| pkg/agent/testdata/AKSWindows2019+K8S119+FIPS/CustomData | Regenerated CustomData snapshot due to Windows CA refresh task wiring. |
| pkg/agent/testdata/AKSWindows2019+K8S119+CSI/CustomData | Regenerated CustomData snapshot due to Windows CA refresh task wiring. |
| pkg/agent/testdata/AKSWindows2019+K8S118/CustomData | Regenerated CustomData snapshot due to Windows CA refresh task wiring. |
| pkg/agent/testdata/AKSWindows2019+K8S117/CustomData | Regenerated CustomData snapshot due to Windows CA refresh task wiring. |
| pkg/agent/testdata/AKSWindows2019+K8S116/CustomData | Regenerated CustomData snapshot due to Windows CA refresh task wiring. |
| pkg/agent/testdata/AKSWindows2019+EnablePrivateClusterHostsConfigAgent/CustomData | Regenerated CustomData snapshot due to Windows CA refresh task wiring. |
| pkg/agent/testdata/AKSWindows2019+CustomVnet/CustomData | Regenerated CustomData snapshot due to Windows CA refresh task wiring. |
| pkg/agent/testdata/AKSWindows2019+CustomCloud/CustomData | Regenerated CustomData snapshot due to Windows CA refresh task wiring. |
| pkg/agent/testdata/AKSWindows2019+CustomCloud+ootcredentialprovider/CustomData | Regenerated CustomData snapshot due to Windows CA refresh task wiring. |
| pkg/agent/testdata/AKSUbuntu2404+Teleport/CustomData | Regenerated CustomData snapshot due to init/custom cloud wiring changes. |
| pkg/agent/testdata/AKSUbuntu2404+NetworkPolicy/CustomData | Regenerated CustomData snapshot due to init/custom cloud wiring changes. |
| pkg/agent/testdata/AKSUbuntu2204+ootcredentialprovider/CustomData | Regenerated CustomData snapshot due to init/custom cloud wiring changes. |
| pkg/agent/testdata/AKSUbuntu2204+cgroupv2/CustomData | Regenerated CustomData snapshot due to init/custom cloud wiring changes. |
| pkg/agent/testdata/AKSUbuntu2204+SecurityProfile/CustomData | Regenerated CustomData snapshot due to init/custom cloud wiring changes. |
| pkg/agent/testdata/AKSUbuntu2204+SSHStatusOn/CustomData | Regenerated CustomData snapshot due to init/custom cloud wiring changes. |
| pkg/agent/testdata/AKSUbuntu2204+China/CustomData | Regenerated CustomData snapshot due to init/custom cloud wiring changes. |
| pkg/agent/testdata/ACL/CustomData.inner | Regenerated CustomData snapshot due to init/custom cloud wiring changes. |
You can also share your feedback on Copilot code review. Take the survey.
| Get-CACertificates | ||
| {{end}} | ||
|
|
||
| Get-CACertificates -Location $Location |
| } | ||
| } | ||
| } | ||
| cloudInitData["initAKSCustomCloud"] = getBase64EncodedGzippedCustomScript(initAKSCustomCloudScript, config) |
d41856f to
18ba549
Compare
18ba549 to
e94c465
Compare
There was a problem hiding this comment.
Pull request overview
This PR updates AKS custom cloud certificate bootstrapping to use a single unified flow and adds a Windows scheduled task for periodic custom cloud CA refresh.
Changes:
- Added Windows CA refresh task registration plus new logic to select cert retrieval mode and opt-in gating.
- Simplified Linux custom cloud init script wiring by removing legacy “operation-requests” variants and normalizing location for refresh mode.
- Added/updated tests and refreshed golden testdata outputs to reflect new custom data content.
Reviewed changes
Copilot reviewed 95 out of 101 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
| staging/cse/windows/kubernetesfunc.tests.ps1 | Adds Pester coverage for endpoint-mode selection, task registration behavior, and CA retrieval failure handling. |
| staging/cse/windows/kubernetesfunc.ps1 | Implements endpoint-mode derivation, opt-in gating, CA retrieval paths, and a Windows scheduled task for refresh. |
| spec/parts/linux/cloud-init/artifacts/init_aks_custom_cloud_spec.sh | Adds ShellSpec checks to ensure init script wiring for ca-refresh mode and LOCATION usage. |
| pkg/agent/variables.go | Simplifies init script selection and updates how custom cloud init script is injected into cloud-init data. |
| pkg/agent/const.go | Removes now-unused custom-cloud init script constants; keeps unified init script constant. |
| parts/windows/kuberneteswindowssetup.ps1 | Updates Windows setup to call Get-CACertificates with Location and conditionally register refresh task. |
| aks-node-controller/parser/templates/cse_cmd.sh.gtpl | Adds LOCATION variable for downstream scripts during custom cloud provisioning. |
| aks-node-controller/parser/helper.go | Adds getCloudLocation helper and reuses it for cloud target env detection. |
| parts/linux/cloud-init/artifacts/init-aks-custom-cloud-operation-requests.sh | Removes legacy operation-requests init script (superseded by unified script). |
| parts/linux/cloud-init/artifacts/init-aks-custom-cloud-operation-requests-mariner.sh | Removes legacy Mariner operation-requests init script (superseded by unified script). |
| parts/linux/cloud-init/artifacts/init-aks-custom-cloud-mariner.sh | Removes legacy Mariner init script variant (superseded by unified script). |
| pkg/agent/testdata/MarinerV2+Kata/CustomData | Updates golden customData to match unified custom cloud init content. |
| pkg/agent/testdata/Flatcar/CustomData.inner | Updates golden ignition/customData payload for unified custom cloud init content. |
| pkg/agent/testdata/CustomizedImageLinuxGuard/CustomData | Updates golden customData to match unified custom cloud init content. |
| pkg/agent/testdata/CustomizedImageKata/CustomData | Updates golden customData to match unified custom cloud init content. |
| pkg/agent/testdata/CustomizedImage/CustomData | Updates golden customData to match unified custom cloud init content. |
| pkg/agent/testdata/AzureLinuxV3+Kata/CustomData | Updates golden customData to match unified custom cloud init content. |
| pkg/agent/testdata/AzureLinuxV2+Kata/CustomData | Updates golden customData to match unified custom cloud init content. |
| pkg/agent/testdata/AKSWindows23H2Gen2+NextGenNetworkingNoConfig/CustomData | Updates golden Windows customData to pass Location to CA cert retrieval + refresh task gating. |
| pkg/agent/testdata/AKSWindows23H2Gen2+NextGenNetworkingDisabled/CustomData | Updates golden Windows customData to pass Location to CA cert retrieval + refresh task gating. |
| pkg/agent/testdata/AKSWindows23H2Gen2+NextGenNetworking/CustomData | Updates golden Windows customData to pass Location to CA cert retrieval + refresh task gating. |
| pkg/agent/testdata/AKSWindows2019+ootcredentialprovider/CustomData | Updates golden Windows customData to pass Location to CA cert retrieval + refresh task gating. |
| pkg/agent/testdata/AKSWindows2019+SecurityProfile/CustomData | Updates golden Windows customData to pass Location to CA cert retrieval + refresh task gating. |
| pkg/agent/testdata/AKSWindows2019+ManagedIdentity/CustomData | Updates golden Windows customData to pass Location to CA cert retrieval + refresh task gating. |
| pkg/agent/testdata/AKSWindows2019+KubeletServingCertificateRotation/CustomData | Updates golden Windows customData to pass Location to CA cert retrieval + refresh task gating. |
| pkg/agent/testdata/AKSWindows2019+KubeletClientTLSBootstrapping/CustomData | Updates golden Windows customData to pass Location to CA cert retrieval + refresh task gating. |
| pkg/agent/testdata/AKSWindows2019+K8S119/CustomData | Updates golden Windows customData to pass Location to CA cert retrieval + refresh task gating. |
| pkg/agent/testdata/AKSWindows2019+K8S119+FIPS/CustomData | Updates golden Windows customData to pass Location to CA cert retrieval + refresh task gating. |
| pkg/agent/testdata/AKSWindows2019+K8S119+CSI/CustomData | Updates golden Windows customData to pass Location to CA cert retrieval + refresh task gating. |
| pkg/agent/testdata/AKSWindows2019+K8S118/CustomData | Updates golden Windows customData to pass Location to CA cert retrieval + refresh task gating. |
| pkg/agent/testdata/AKSWindows2019+K8S117/CustomData | Updates golden Windows customData to pass Location to CA cert retrieval + refresh task gating. |
| pkg/agent/testdata/AKSWindows2019+K8S116/CustomData | Updates golden Windows customData to pass Location to CA cert retrieval + refresh task gating. |
| pkg/agent/testdata/AKSWindows2019+EnablePrivateClusterHostsConfigAgent/CustomData | Updates golden Windows customData to pass Location to CA cert retrieval + refresh task gating. |
| pkg/agent/testdata/AKSWindows2019+CustomVnet/CustomData | Updates golden Windows customData to pass Location to CA cert retrieval + refresh task gating. |
| pkg/agent/testdata/AKSWindows2019+CustomCloud/CustomData | Updates golden Windows customData to pass Location to CA cert retrieval + refresh task gating. |
| pkg/agent/testdata/AKSWindows2019+CustomCloud+ootcredentialprovider/CustomData | Updates golden Windows customData to pass Location to CA cert retrieval + refresh task gating. |
| pkg/agent/testdata/AKSUbuntu2404+Teleport/CustomData | Updates golden customData to match unified custom cloud init content. |
| pkg/agent/testdata/AKSUbuntu2404+NetworkPolicy/CustomData | Updates golden customData to match unified custom cloud init content. |
| pkg/agent/testdata/AKSUbuntu2204+ootcredentialprovider/CustomData | Updates golden customData to match unified custom cloud init content. |
| pkg/agent/testdata/AKSUbuntu2204+cgroupv2/CustomData | Updates golden customData to match unified custom cloud init content. |
| pkg/agent/testdata/AKSUbuntu2204+SecurityProfile/CustomData | Updates golden customData to match unified custom cloud init content. |
| pkg/agent/testdata/AKSUbuntu2204+SSHStatusOn/CustomData | Updates golden customData to match unified custom cloud init content. |
| pkg/agent/testdata/AKSUbuntu2204+SSHStatusOff/CustomData | Updates golden customData to match unified custom cloud init content. |
| pkg/agent/testdata/AKSUbuntu2204+China/CustomData | Updates golden customData to match unified custom cloud init content. |
| pkg/agent/testdata/ACL/CustomData.inner | Updates golden ignition/customData payload for unified custom cloud init content. |
Comments suppressed due to low confidence (7)
staging/cse/windows/kubernetesfunc.ps1:1
Get-CACertificatesused to fail fast viaSet-ExitCodeon retrieval/parse errors, but now returns$false(and logs warnings) for a wide range of failure cases. Because call sites in the generated setup scripts invokeGet-CACertificates -Location $Locationwithout checking the return value, this can silently proceed without required CA material and lead to harder-to-diagnose TLS failures later in provisioning. Consider restoring fatal behavior for “expected-to-install” scenarios (e.g., legacy mode, or rcv1p when opted-in), or have callers check the return value and invokeSet-ExitCodewhen it’s$falsein those modes.
staging/cse/windows/kubernetesfunc.ps1:1Get-CACertificatesused to fail fast viaSet-ExitCodeon retrieval/parse errors, but now returns$false(and logs warnings) for a wide range of failure cases. Because call sites in the generated setup scripts invokeGet-CACertificates -Location $Locationwithout checking the return value, this can silently proceed without required CA material and lead to harder-to-diagnose TLS failures later in provisioning. Consider restoring fatal behavior for “expected-to-install” scenarios (e.g., legacy mode, or rcv1p when opted-in), or have callers check the return value and invokeSet-ExitCodewhen it’s$falsein those modes.
pkg/agent/variables.go:1- This change removes the previous
cs.IsAKSCustomCloud()guard and injects the custom cloud init script intocloudInitDataunconditionally. That can increase customData size for all clusters (risking platform limits) and may introduce unintended side effects if any downstream template writes/executes this script outside custom cloud. Recommend reinstating the custom cloud guard (and only settinginitAKSCustomCloudwhenIsAKSCustomCloud()is true), while still using the unifiedinitAKSCustomCloudScriptfor all custom clouds.
staging/cse/windows/kubernetesfunc.ps1:1 $resourceFileNameis used directly to build a path underC:\ca. If the upstream response ever contains path separators (e.g.,..\fooor nested paths), this can write outside the intended directory. Prefer sanitizing to a basename (e.g., usingSplit-Path -Leafor[IO.Path]::GetFileName($resourceFileName)) beforeJoin-Path, and consider rejecting names containing directory traversal characters.
staging/cse/windows/kubernetesfunc.ps1:1$resourceFileNameis used directly to build a path underC:\ca. If the upstream response ever contains path separators (e.g.,..\fooor nested paths), this can write outside the intended directory. Prefer sanitizing to a basename (e.g., usingSplit-Path -Leafor[IO.Path]::GetFileName($resourceFileName)) beforeJoin-Path, and consider rejecting names containing directory traversal characters.
staging/cse/windows/kubernetesfunc.ps1:1- The new rcv1p operation-requests flow is non-trivial (multiple requests, JSON shape assumptions, per-item content downloads, and
$downloadedAnyaggregation), but the added Pester tests only cover legacy mode and the “throws returns false” path. Add tests that (1) exercise the rcv1p path end-to-end with mockedRetry-Commandreturning operation requests and cert bodies, and (2) verify behavior when operation requests are empty/invalid (ensuring the function returns$falseand logs expected warnings).
pkg/agent/variables.go:1 - The PR description still contains placeholder text (
Fixes #with no linked issue and no explanation of “what/why”). Please update the PR description to summarize the behavior change (unified bootstrap + Windows refresh task) and link the relevant issue or remove the placeholder.
| REPO_DEPOT_ENDPOINT="{{.CustomCloudConfig.RepoDepotEndpoint}}" | ||
| {{getInitAKSCustomCloudFilepath}} >> /var/log/azure/cluster-provision.log 2>&1; | ||
| {{end}} | ||
| LOCATION="{{getCloudLocation .}}" |
There was a problem hiding this comment.
LOCATION="..." is not exported, and provision_start.sh is executed in a new /bin/bash -c process. As written, provision_start.sh will not receive LOCATION in its environment. If the intent is for downstream scripts to consume LOCATION (as the PR title/added specs suggest), change this to export the variable (or inline it into the bash -c command).
| LOCATION="{{getCloudLocation .}}" | |
| export LOCATION="{{getCloudLocation .}}" |
e94c465 to
f20d5b8
Compare
Co-authored-by: Jane Jung <janejung@microsoft.com> Co-authored-by: janenotjung-hue <107402425+janenotjung-hue@users.noreply.github.com> Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> Co-authored-by: aks-node-assistant[bot] <190555641+aks-node-assistant[bot]@users.noreply.github.com>
https://eng.ms/docs/products/onecert-certificates-key-vault-and-dsms/onecert-customer-guide/autorotationandecr/overviewrcv https://eng.ms/docs/products/onecert-certificates-key-vault-and-dsms/onecert-customer-guide/autorotationandecr/rcv1ptsg cse_cmd.sh.gtpl: derive cert endpoint mode from target cloud and always run custom-cloud init script. cse_cmd.sh: same mode logic as template; remove LOCATION export. init-aks-custom-cloud.sh: merged legacy + operation-requests logic into one script with distro-aware cert install paths. parts/linux/cloud-init/artifacts/init-aks-custom-cloud-mariner.sh: removed (merged into unified script). parts/linux/cloud-init/artifacts/init-aks-custom-cloud-operation-requests.sh: removed (merged into unified script). parts/linux/cloud-init/artifacts/init-aks-custom-cloud-operation-requests-mariner.sh: removed (merged into unified script). const.go: keep only unified custom-cloud init script constant. variables.go: simplify script selection to always use unified init script. kubernetesfunc.ps1: add location-aware CA retrieval (legacy/rcv1p) and scheduled refresh task registration helper. kuberneteswindowssetup.ps1: pass location to CA retrieval and register refresh task for custom cloud.
… for legacy and opted-in rcv1p modes
… schedule installation
f20d5b8 to
b53f240
Compare
| esac | ||
|
|
||
| echo "Using custom cloud certificate endpoint mode: ${cert_endpoint_mode}" | ||
| install_ca_refresh_schedule=0 |
There was a problem hiding this comment.
rm -f /root/AzureCACertificates/* can emit errors when /root/AzureCACertificates doesn't exist yet (and will also run even when no certs are expected). Create the directory first (or guard the cleanup with mkdir -p / test -d) to keep logs clean and behavior predictable.
| install_ca_refresh_schedule=0 | |
| install_ca_refresh_schedule=0 | |
| mkdir -p /root/AzureCACertificates |
|
|
||
| It 'handles mixed-case input' { | ||
| Get-CustomCloudCertEndpointModeFromLocation -Location 'UsSeCeast' | Should Be 'legacy' | ||
| } |
There was a problem hiding this comment.
These assertions use legacy Pester syntax (Should Be / Should Match). The repo’s existing Pester tests use Should -Be / Should -Match (see parts/windows/windowscsehelper.tests.ps1), and some Pester versions don’t support the legacy form. Update to the Should -<Assertion> syntax to match the repo and avoid CI failures.
| } | ||
| } | ||
| } | ||
| cloudInitData["initAKSCustomCloud"] = getBase64EncodedGzippedCustomScript(initAKSCustomCloudScript, config) |
There was a problem hiding this comment.
initAKSCustomCloud is now always embedded into cloud-init variables, even when the cluster is not an AKS custom cloud. Since GetVariableProperty already returns an empty string for missing keys, this change increases customData payload size for every Linux node and could push some configurations closer to Azure customData size limits. Consider restoring the cs.IsAKSCustomCloud() guard (and/or conditionally writing the file in nodecustomdata.yml) so the script is only included when it will actually be executed.
| cloudInitData["initAKSCustomCloud"] = getBase64EncodedGzippedCustomScript(initAKSCustomCloudScript, config) | |
| if cs.IsAKSCustomCloud() { | |
| cloudInitData["initAKSCustomCloud"] = getBase64EncodedGzippedCustomScript(initAKSCustomCloudScript, config) | |
| } |
|
|
||
| Get-CACertificates -Location $Location | ||
|
|
||
| Write-CACert -CACertificate $global:CACertificate ` | ||
| -KubeDir $global:KubeDir |
There was a problem hiding this comment.
Get-CACertificates -Location $Location is now invoked outside the {{if IsAKSCustomCloud}} block, so it will run for non-custom-cloud Windows nodes too. In public cloud this adds extra wireserver calls (with retries) during provisioning and may slow node boot or introduce avoidable failures/noise. Consider keeping CA cert retrieval inside the custom-cloud conditional (or add a guard inside Get-CACertificates to no-op when not in a custom cloud).
| Get-CACertificates -Location $Location | |
| Write-CACert -CACertificate $global:CACertificate ` | |
| -KubeDir $global:KubeDir | |
| {{if IsAKSCustomCloud}} | |
| Get-CACertificates -Location $Location | |
| Write-CACert -CACertificate $global:CACertificate ` | |
| -KubeDir $global:KubeDir | |
| {{end}} |
| # Guard against older CSE packages that do not yet export Should-InstallCACertificatesRefreshTask. | ||
| # If the function is absent (old package), fall back to the previous unconditional behaviour so | ||
| # that legacy/ussec/usnat clusters continue to register the refresh task. | ||
| if (Get-Command -Name Should-InstallCACertificatesRefreshTask -ErrorAction Ignore) { | ||
| if (Should-InstallCACertificatesRefreshTask -Location $Location) { | ||
| Register-CACertificatesRefreshTask -Location $Location | ||
| } | ||
| } elseif (Get-Command -Name Register-CACertificatesRefreshTask -ErrorAction Ignore) { | ||
| Register-CACertificatesRefreshTask -Location $Location | ||
| } |
There was a problem hiding this comment.
The CA refresh scheduled-task registration logic is executed unconditionally in BasePrep. Even when Should-InstallCACertificatesRefreshTask exists, it will call the opt-in endpoint (with retries) for any non-legacy location, which can add latency to every node boot in non-custom clouds. Consider gating this entire block under {{if IsAKSCustomCloud}} (or another explicit custom-cloud signal) so public-cloud nodes never attempt the opt-in call or task registration.
| return | ||
| } | ||
|
|
||
| $refreshCommand = "& { . 'C:\AzureData\windows\windowscsehelper.ps1'; . 'C:\AzureData\windows\kubernetesfunc.ps1'; Get-CACertificates -Location '$Location' | Out-Null }" |
There was a problem hiding this comment.
$Location is interpolated into the scheduled task PowerShell -Command string inside single quotes. If the location ever contains a single quote or other special characters, this can break quoting and potentially allow argument injection into the scheduled command. Prefer passing the location as a proper argument (e.g., -File with -ArgumentList, or escaping single quotes in $Location before embedding it).
| $refreshCommand = "& { . 'C:\AzureData\windows\windowscsehelper.ps1'; . 'C:\AzureData\windows\kubernetesfunc.ps1'; Get-CACertificates -Location '$Location' | Out-Null }" | |
| $escapedLocation = $Location -replace "'", "''" | |
| $refreshCommand = "& { . 'C:\AzureData\windows\windowscsehelper.ps1'; . 'C:\AzureData\windows\kubernetesfunc.ps1'; Get-CACertificates -Location '$escapedLocation' | Out-Null }" |
|
|
||
| local response | ||
| while [ $attempt -le $max_retries ]; do | ||
| response=$(curl -f --no-progress-meter "$url") |
There was a problem hiding this comment.
make_request_with_retry uses curl without any connect/overall timeout. If the wireserver connection stalls (not hard-failing), a single curl invocation can hang indefinitely and block provisioning/refresh despite the retry loop. Add --connect-timeout and --max-time (or similar) to bound worst-case latency per attempt.
| local response | |
| while [ $attempt -le $max_retries ]; do | |
| response=$(curl -f --no-progress-meter "$url") | |
| local connect_timeout=5 | |
| local max_time=10 | |
| local response | |
| while [ $attempt -le $max_retries ]; do | |
| response=$(curl -f --no-progress-meter --connect-timeout "$connect_timeout" --max-time "$max_time" "$url") |
…ndpoint mode handling for legacy and rcv1p regions
…et-CACertificates to verify URI handling
| certs=$(make_request_with_retry "${WIRESERVER_ENDPOINT}/machine?comp=acmspackage&type=cacertificates&ext=json") | ||
| if [ -z "$certs" ]; then | ||
| echo "Warning: failed to retrieve legacy custom cloud certificates" | ||
| return 1 | ||
| fi | ||
|
|
||
| IFS_backup=$IFS | ||
| IFS=$'\r\n' | ||
| cert_names=($(echo $certs | grep -oP '(?<=Name\": \")[^\"]*')) | ||
| cert_bodies=($(echo $certs | grep -oP '(?<=CertBody\": \")[^\"]*')) | ||
| for i in ${!cert_bodies[@]}; do | ||
| echo ${cert_bodies[$i]} | sed 's/\\r\\n/\n/g' | sed 's/\\//g' > "/root/AzureCACertificates/$(echo ${cert_names[$i]} | sed 's/.cer/.crt/g')" | ||
| done | ||
| IFS=$IFS_backup |
There was a problem hiding this comment.
retrieve_legacy_certs (and similarly process_cert_operations) writes into /root/AzureCACertificates/... but the directory is no longer created before retrieval (the earlier top-level mkdir -p /root/AzureCACertificates was removed, and install_certs_to_trust_store creates it only after retrieval succeeds). This will cause certificate download to fail on a fresh node. Create /root/AzureCACertificates (and optionally clear it) before any retrieval functions attempt to write files.
| } catch { | ||
| Set-ExitCode -ExitCode $global:WINDOWS_CSE_ERROR_DOWNLOAD_CA_CERTIFICATES -ErrorMessage "Failed to download CA certificates rawdata. Error: $_" | ||
| $caCerts = ($rawData.Content) | ConvertFrom-Json | ||
| if ($null -eq $caCerts -or $null -eq $caCerts.Certificates -or $caCerts.Certificates.Length -eq 0) { |
There was a problem hiding this comment.
Using .Length on $caCerts.Certificates is brittle in PowerShell because the JSON might deserialize Certificates as a single object rather than an array; in that case .Length can be $null or misleading. Prefer using @($caCerts.Certificates).Count -eq 0 (or equivalent) to correctly handle both single-item and array cases.
| if ($null -eq $caCerts -or $null -eq $caCerts.Certificates -or $caCerts.Certificates.Length -eq 0) { | |
| if ($null -eq $caCerts -or $null -eq $caCerts.Certificates -or @($caCerts.Certificates).Count -eq 0) { |
What this PR does / why we need it:
Which issue(s) this PR fixes:
Fixes #