feat: improve cse bootstrap latency by deferring non-critical work#8105
feat: improve cse bootstrap latency by deferring non-critical work#8105awesomenix wants to merge 1 commit intomainfrom
Conversation
There was a problem hiding this comment.
Pull request overview
This PR reduces Linux CSE bootstrap critical-path work by deferring non-essential steps until after ensureKubelet, and updates generated pkg/agent/testdata snapshots to reflect the new CSE/custom data output.
Changes:
- Defers
ensureNoDupOnPromiscuBridge,enableLocalDNS, and non-GPU driver cleanup until afterensureKubeletincse_main.sh. - Optimizes provisioning/runtime setup by switching kube binary activation to
mv+chmod, and reloading only a targeted sysctl file instead ofsysctl --system. - Updates VHD cleanup to disable
containerdand regeneratespkg/agent/testdataCustomData snapshots.
Reviewed changes
Copilot reviewed 18 out of 75 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
vhdbuilder/packer/cleanup-vhd.sh |
Disables containerd during VHD cleanup to avoid shipping images with it enabled. |
parts/linux/cloud-init/artifacts/cse_main.sh |
Defers some non-critical steps until after ensureKubelet; skips container runtime install for golden images/OSGuard. |
parts/linux/cloud-init/artifacts/cse_install.sh |
Changes kubelet/kubectl “activation” to mv + chmod to avoid redundant copy work. |
parts/linux/cloud-init/artifacts/cse_config.sh |
Uses targeted sysctl -p and starts kubelet before the TLS bootstrapping latency measurement service. |
pkg/agent/testdata/MarinerV2+Kata/CustomData |
Regenerated snapshot for updated CSE/custom data output. |
pkg/agent/testdata/CustomizedImage/CustomData |
Regenerated snapshot for updated CSE/custom data output. |
You can also share your feedback on Copilot code review. Take the survey.
| mv "/opt/bin/kubelet-${KUBERNETES_VERSION}" /opt/bin/kubelet | ||
| mv "/opt/bin/kubectl-${KUBERNETES_VERSION}" /opt/bin/kubectl | ||
|
|
||
| chmod a+x /opt/bin/kubelet /opt/bin/kubectl |
There was a problem hiding this comment.
this was what was before, keeping it as is
There was a problem hiding this comment.
why this change ? I'm not understanding, install was cleaner ? but slower ?
There was a problem hiding this comment.
also, why not force the access level ?
There was a problem hiding this comment.
also curious about both
There was a problem hiding this comment.
install does a copy and not a move.
Operation: It copies the file to the destination. A key difference from cp is that install unlinks (removes) the destination file first if it already exists, which can prevent issues (like an EBUSY error) when replacing a running executable.
There was a problem hiding this comment.
i kept the operation as is before chewi made the change to avoid regression, not sure if it was better or worse but just guarenteed to work and no regression.
There was a problem hiding this comment.
Regression? My change was merged two months ago. There are important reasons to use install over cp, including the one stated above. There are cases where the destination will be an existing symlink, and it is crucial that we replace the symlink, not its target. mv will do that, but I can't remember if there was some other reason why I didn't stick with mv.
7e6264c to
192e020
Compare
192e020 to
c95a094
Compare
There was a problem hiding this comment.
Pull request overview
This PR aims to reduce Linux CSE bootstrap critical-path latency by deferring non-critical steps until after ensureKubelet, avoiding redundant work (targeted sysctl reload, moving kube binaries), and adjusting VHD build/runtime behaviors around containerd.
Changes:
- Reorders CSE provisioning steps so kubelet starts earlier; starts
kubeletbeforemeasure-tls-bootstrapping-latency.service. - Optimizes provisioning work (targeted
sysctl -p,mv+chmodfor kube binaries, skip runtime install when golden image already contains it). - Adjusts VHD build scripts/tests to ensure containerd is started when needed and disabled during image cleanup; regenerates
pkg/agent/testdata.
Reviewed changes
Copilot reviewed 18 out of 77 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
| vhdbuilder/packer/trivy-scan.sh | Sources provision helpers and ensures containerd is started before Trivy operations. |
| vhdbuilder/packer/test/linux-vhd-content-test.sh | Starts containerd before executing VHD validation tests. |
| vhdbuilder/packer/cleanup-vhd.sh | Disables containerd during VHD cleanup. |
| parts/linux/cloud-init/artifacts/cse_main.sh | Defers non-critical steps until after ensureKubelet; skips container runtime install on golden images. |
| parts/linux/cloud-init/artifacts/cse_install.sh | Uses mv + chmod when activating downloaded kubelet/kubectl. |
| parts/linux/cloud-init/artifacts/cse_config.sh | Uses targeted sysctl -p and starts kubelet before the TLS bootstrapping latency measurement service. |
| pkg/agent/testdata/MarinerV2+Kata/CustomData | Regenerated snapshot output for MarinerV2+Kata CustomData. |
| pkg/agent/testdata/CustomizedImage/CustomData | Regenerated snapshot output for CustomizedImage CustomData. |
You can also share your feedback on Copilot code review. Take the survey.
e2e/config/vhd.go
Outdated
| Arch: "arm64", | ||
| Distro: datamodel.AKSAzureLinuxV3Arm64Gen2, | ||
| Gallery: imageGalleryLinux, | ||
| Flatcar: true, |
There was a problem hiding this comment.
ugh! copy paste error.
e2e/scenario_test.go
Outdated
|
|
||
| func Test_AzureLinuxV3_ARM64(t *testing.T) { | ||
| RunScenario(t, &Scenario{ | ||
| Description: "Tests that a node using a Flatcar VHD on ARM64 architecture can be properly bootstrapped", |
There was a problem hiding this comment.
shouldn't this be Tests that a node using an AzureLinuxV3 VHD on ARM64 architecture can be properly bootstrapped
|
|
||
| logs_to_events "AKS.CSE.ensureKubelet" ensureKubelet | ||
|
|
||
| if [ "${ENSURE_NO_DUPE_PROMISCUOUS_BRIDGE}" = "true" ]; then |
There was a problem hiding this comment.
so, just note that these changes aren't improving CSE latency right, they're just improving node registration latency (which is good don't get me wrong, though from the RP side, the operation should end up taking the same amount of time since RP will block synchronously on the CRP call to create / update the VM/VMSS, which is solely dependent on CSE execution time, not node registration time)
There was a problem hiding this comment.
That would be half true, since due to reordering some startup/components, the dependent components start faster hence a faster overall CSE finish. But at end of the day we are still at mercy of CSE.
But you are right, we are purely focussed on Node registration speed up
|
|
| fi | ||
| fi | ||
| install -m0755 "/opt/bin/kubelet-${KUBERNETES_VERSION}" /opt/bin/kubelet | ||
| install -m0755 "/opt/bin/kubectl-${KUBERNETES_VERSION}" /opt/bin/kubectl |
There was a problem hiding this comment.
yeah, wondering why install would be taking longer, if the correct version of kubelet/kubectl is cached on the VHD it should just move it to /opt/bin, no?
| fi | ||
|
|
||
| # start measure-tls-bootstrapping-latency.service without waiting for the main process to start, while ignoring any failures | ||
| if ! systemctlEnableAndStartNoBlock measure-tls-bootstrapping-latency 30; then |
There was a problem hiding this comment.
is this really adding a lot of latency? systemctlEnableAndStartNoBlock explicitly does not block, meaning systemd doesn't wait for the unit to enter a running state before returning
mainly asking since moving this below ensureKubelet seems to add a fair bit of complexity
|
|
Not true, we actually restart during our CSE provisioning after we drop the config, so instead of performing double restarts and possibly slower since a restart is slower than a start. |
2fc37be to
d8a4b88
Compare
There was a problem hiding this comment.
Pull request overview
This PR aims to reduce Linux node provisioning (CSE) bootstrap latency by moving non-critical work off the kubelet startup critical path, and by trimming some redundant/expensive operations during provisioning and VHD validation.
Changes:
- Reorders portions of the Linux CSE flow so kubelet starts earlier, and updates TLS bootstrapping latency measurement to use a start-time file written immediately before kubelet startup.
- Optimizes certain provisioning steps (targeted
sysctl -p,mv+chmodfor kube binaries) and adjusts VHD/Packer scripts to manage containerd state for scanning/tests/cleanup. - Adds an Azure Linux V3 Gen2 ARM64 E2E image definition and scenario, and regenerates
pkg/agent/testdataoutputs.
Reviewed changes
Copilot reviewed 20 out of 82 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
vhdbuilder/packer/trivy-scan.sh |
Sources provisioning helpers and explicitly enables/starts containerd before running Trivy scans. |
vhdbuilder/packer/test/linux-vhd-content-test.sh |
Reloads systemd and restarts containerd at test start. |
vhdbuilder/packer/cleanup-vhd.sh |
Disables containerd during VHD cleanup to avoid shipping it enabled. |
spec/parts/linux/cloud-init/artifacts/measure_tls_bootstrapping_latency_spec.sh |
Updates ShellSpec coverage for TLS bootstrapping latency measurement behavior and the new start-time file logic. |
pkg/agent/testdata/CustomizedImage/CustomData |
Regenerated snapshot testdata for updated custom data/CSE outputs. |
parts/linux/cloud-init/artifacts/measure-tls-bootstrapping-latency.sh |
Adds configurable start-time filepath and emits completion events based on a pre-kubelet start timestamp. |
parts/linux/cloud-init/artifacts/cse_start.sh |
Adds a ScriptlessMode datapoint to the guest agent event message payload. |
parts/linux/cloud-init/artifacts/cse_main.sh |
Defers non-critical steps until after ensureKubelet; skips container runtime install when full install isn’t required. |
parts/linux/cloud-init/artifacts/cse_install.sh |
Uses mv + chmod when activating downloaded kubelet/kubectl binaries. |
parts/linux/cloud-init/artifacts/cse_config.sh |
Uses targeted sysctl -p for bridge-forwarding config; writes TLS bootstrapping start time before starting kubelet; starts the measurement service after kubelet. |
e2e/scenario_test.go |
Adds an Azure Linux V3 Gen2 ARM64 E2E scenario. |
e2e/config/vhd.go |
Adds an Azure Linux V3 Gen2 ARM64 image definition for E2E. |
You can also share your feedback on Copilot code review. Take the survey.
d8a4b88 to
0cd55f9
Compare
0cd55f9 to
6e10943
Compare
There was a problem hiding this comment.
Pull request overview
This PR reduces Linux CSE bootstrap critical-path work by moving non-essential steps later, adds more precise TLS bootstrapping latency measurement, and updates test artifacts/config to reflect the new behavior.
Changes:
- Defer non-critical CSE steps until after
ensureKubeletand adjust containerd/sysctl handling to reduce redundant work. - Start kubelet before the TLS-bootstrapping-latency measurement service, using a start-time file to preserve the latency signal.
- Add/adjust tests and testdata, including a new AzureLinux V3 ARM64 e2e scenario and regenerated
pkg/agent/testdata.
Reviewed changes
Copilot reviewed 21 out of 83 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
vhdbuilder/packer/trivy-scan.sh |
Source provisioning helpers and ensure containerd is started before scanning. |
vhdbuilder/packer/test/linux-vhd-content-test.sh |
Restart containerd before running VHD content tests. |
vhdbuilder/packer/cleanup-vhd.sh |
Disable containerd during VHD cleanup. |
spec/parts/linux/cloud-init/artifacts/measure_tls_bootstrapping_latency_spec.sh |
Update ShellSpec coverage for new TLS bootstrapping start-time behavior and race handling. |
pkg/agent/testdata/CustomizedImage/CustomData |
Regenerated snapshot testdata reflecting updated custom data/CSE output. |
parts/linux/cloud-init/artifacts/measure-tls-bootstrapping-latency.sh |
Use start-time file, emit completion event even for fast/racy kubeconfig creation, and improve quoting. |
parts/linux/cloud-init/artifacts/kubelet.service |
Add pre-start wait for containerd socket. |
parts/linux/cloud-init/artifacts/cse_start.sh |
Emit scriptless-mode datapoint in the guest agent event message. |
parts/linux/cloud-init/artifacts/cse_main.sh |
Defer non-critical steps until after kubelet startup; skip container runtime install on golden images. |
parts/linux/cloud-init/artifacts/cse_install.sh |
Activate kube binaries via mv + chmod rather than install. |
parts/linux/cloud-init/artifacts/cse_config.sh |
Use targeted sysctl -p, start containerd non-blocking, and write TLS bootstrap start-time before starting kubelet. |
e2e/scenario_test.go |
Add AzureLinux V3 ARM64 scenario. |
e2e/config/vhd.go |
Add e2e image config for AzureLinux V3 ARM64 Gen2. |
You can also share your feedback on Copilot code review. Take the survey.
| ExecStartPre=-/sbin/iptables -t nat --numeric --list | ||
|
|
||
| ExecStartPre=/bin/bash /opt/azure/containers/validate-kubelet-credentials.sh | ||
| ExecStartPre=/bin/sh -c 'until [ -S /run/containerd/containerd.sock ]; do sleep 0.1; done' |
| mv "/opt/bin/kubelet-${KUBERNETES_VERSION}" /opt/bin/kubelet | ||
| mv "/opt/bin/kubectl-${KUBERNETES_VERSION}" /opt/bin/kubectl | ||
|
|
||
| chmod a+x /opt/bin/kubelet /opt/bin/kubectl |
| systemctl daemon-reload && systemctl restart containerd | ||
|
|
|
|
||
| func Test_AzureLinuxV3_ARM64(t *testing.T) { | ||
| RunScenario(t, &Scenario{ | ||
| Description: "Tests that a node using a AzureLinuxV3 VHD on ARM64 architecture can be properly bootstrapped", |
6e10943 to
9c85db1
Compare
9c85db1 to
293a28c
Compare
There was a problem hiding this comment.
Pull request overview
This PR aims to reduce node bootstrap latency by moving non-critical CSE work off the kubelet critical path, tightening a few runtime operations (sysctl reload, kube binary activation), and updating test/snapshot outputs to reflect the new generated artifacts.
Changes:
- Reorders Linux CSE steps so
ensureKubelethappens earlier; adds TLS bootstrapping latency “start time” handoff so measurement can run after kubelet starts. - Adjusts containerd/systemd handling across CSE + VHD build/test scripts (targeted sysctl reload, containerd enable/start ordering, VHD cleanup disabling containerd).
- Adds an ARM64 AzureLinuxV3 e2e scenario and updates generated
pkg/agent/testdata.
Reviewed changes
Copilot reviewed 22 out of 84 changed files in this pull request and generated 6 comments.
Show a summary per file
| File | Description |
|---|---|
| vhdbuilder/packer/trivy-scan.sh | Sources provision helpers and explicitly starts containerd before running trivy scans. |
| vhdbuilder/packer/test/linux-vhd-content-test.sh | Restarts containerd prior to running VHD content validations. |
| vhdbuilder/packer/cleanup-vhd.sh | Disables containerd during VHD cleanup so captured images don’t ship with it enabled. |
| spec/parts/linux/cloud-init/artifacts/measure_tls_bootstrapping_latency_spec.sh | Updates ShellSpec tests to reflect new “start time file” behavior and additional event emission cases. |
| parts/linux/cloud-init/artifacts/measure-tls-bootstrapping-latency.sh | Adds start-time-file gating and emits completion events even for fast kubelet startups/races. |
| parts/linux/cloud-init/artifacts/kubelet.service | Adds a pre-start wait for containerd socket presence. |
| parts/linux/cloud-init/artifacts/cse_start.sh | Adds “scriptless mode” signal into guest agent event message payload. |
| parts/linux/cloud-init/artifacts/cse_main.sh | Defers bridge/localdns/GPU cleanup work until after ensureKubelet. |
| parts/linux/cloud-init/artifacts/cse_install.sh | Activates kube binaries via mv + chmod; ensures containerd is ready before China retagging. |
| parts/linux/cloud-init/artifacts/cse_config.sh | Adds waitForContainerdReady; switches to targeted sysctl -p; writes TLS start time before kubelet start; starts measure service after kubelet. |
| pkg/agent/testdata/Flatcar/CustomData | Regenerated snapshot output. |
| pkg/agent/testdata/ACL/CustomData | Regenerated snapshot output. |
| e2e/scenario_test.go | Adds AzureLinuxV3 ARM64 scenario; minor text updates. |
| e2e/scenario_gpu_managed_experience_test.go | Adds GPU tag to dcgm-exporter compatibility scenario. |
| e2e/config/vhd.go | Adds AzureLinuxV3 Gen2 ARM64 VHD entry. |
| allMCRImages=($(ctr --namespace k8s.io images list | grep '^mcr.microsoft.com/' | awk '{print $1}')) | ||
| if [ -z "${allMCRImages}" ]; then |
There was a problem hiding this comment.
allMCRImages is no longer declared local inside retagMCRImagesForChina, so it becomes a global array and can unintentionally leak/override state used elsewhere in the script. Make this a local variable (e.g., local -a allMCRImages=(...)) to avoid side effects between functions.
| allMCRImages=($(ctr --namespace k8s.io images list | grep '^mcr.microsoft.com/' | awk '{print $1}')) | |
| if [ -z "${allMCRImages}" ]; then | |
| local -a allMCRImages=($(ctr --namespace k8s.io images list | grep '^mcr.microsoft.com/' | awk '{print $1}')) | |
| if [ ${#allMCRImages[@]} -eq 0 ]; then |
Summary
ensureKubeletsysctl -p, moving kube binaries instead of copying them, and skipping container runtime install when the golden image already has itpkg/agent/testdatato capture the updated CSE/custom data outputWhat changed
ensureNoDupOnPromiscuBridge,enableLocalDNS, and non-GPU driver cleanup later incse_main.shso kubelet startup happens earlierkubeletbeforemeasure-tls-bootstrapping-latency.serviceinstallwithmvpluschmodwhen activating downloadedkubelet/kubectlbinaries/etc/sysctl.d/99-force-bridge-forward.confinstead of runningsysctl --systemcontainerdduring VHD cleanup so the image does not carry it enabled prematurelyValidation
Timings
BeforeAfter