Skip to content

feat: improve cse bootstrap latency by deferring non-critical work#8105

Open
awesomenix wants to merge 1 commit intomainfrom
nishp/tinyimprovements
Open

feat: improve cse bootstrap latency by deferring non-critical work#8105
awesomenix wants to merge 1 commit intomainfrom
nishp/tinyimprovements

Conversation

@awesomenix
Copy link
Contributor

@awesomenix awesomenix commented Mar 16, 2026

Summary

  • reduce CSE bootstrap critical-path work by deferring non-essential steps until after ensureKubelet
  • avoid redundant work during provisioning by using targeted sysctl -p, moving kube binaries instead of copying them, and skipping container runtime install when the golden image already has it
  • regenerate pkg/agent/testdata to capture the updated CSE/custom data output

What changed

  • move ensureNoDupOnPromiscuBridge, enableLocalDNS, and non-GPU driver cleanup later in cse_main.sh so kubelet startup happens earlier
  • start kubelet before measure-tls-bootstrapping-latency.service
  • replace install with mv plus chmod when activating downloaded kubelet/kubectl binaries
  • reload only /etc/sysctl.d/99-force-bridge-forward.conf instead of running sysctl --system
  • disable containerd during VHD cleanup so the image does not carry it enabled prematurely

Validation

  • Not run (PR created from the latest local commit only)

Timings

Before

  • CSE start: +0.000s
  • kubelet started: +25.000s
  • node registered: +26.270s
  • NodeReady: +26.486s

After

  • CSE start: +0.000s
  • ensureKubelet done: +13.063s
  • kubelet started: +19.000s
  • node registered: +20.462s
  • NodeReady: +20.789s

Copilot AI review requested due to automatic review settings March 16, 2026 23:37
@awesomenix awesomenix changed the title Improve CSE bootstrap latency by deferring non-critical work feat: improve cse bootstrap latency by deferring non-critical work Mar 16, 2026
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR reduces Linux CSE bootstrap critical-path work by deferring non-essential steps until after ensureKubelet, and updates generated pkg/agent/testdata snapshots to reflect the new CSE/custom data output.

Changes:

  • Defers ensureNoDupOnPromiscuBridge, enableLocalDNS, and non-GPU driver cleanup until after ensureKubelet in cse_main.sh.
  • Optimizes provisioning/runtime setup by switching kube binary activation to mv + chmod, and reloading only a targeted sysctl file instead of sysctl --system.
  • Updates VHD cleanup to disable containerd and regenerates pkg/agent/testdata CustomData snapshots.

Reviewed changes

Copilot reviewed 18 out of 75 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
vhdbuilder/packer/cleanup-vhd.sh Disables containerd during VHD cleanup to avoid shipping images with it enabled.
parts/linux/cloud-init/artifacts/cse_main.sh Defers some non-critical steps until after ensureKubelet; skips container runtime install for golden images/OSGuard.
parts/linux/cloud-init/artifacts/cse_install.sh Changes kubelet/kubectl “activation” to mv + chmod to avoid redundant copy work.
parts/linux/cloud-init/artifacts/cse_config.sh Uses targeted sysctl -p and starts kubelet before the TLS bootstrapping latency measurement service.
pkg/agent/testdata/MarinerV2+Kata/CustomData Regenerated snapshot for updated CSE/custom data output.
pkg/agent/testdata/CustomizedImage/CustomData Regenerated snapshot for updated CSE/custom data output.

You can also share your feedback on Copilot code review. Take the survey.

mv "/opt/bin/kubelet-${KUBERNETES_VERSION}" /opt/bin/kubelet
mv "/opt/bin/kubectl-${KUBERNETES_VERSION}" /opt/bin/kubectl

chmod a+x /opt/bin/kubelet /opt/bin/kubectl
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this was what was before, keeping it as is

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why this change ? I'm not understanding, install was cleaner ? but slower ?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also, why not force the access level ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also curious about both

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

install does a copy and not a move.

Operation: It copies the file to the destination. A key difference from cp is that install unlinks (removes) the destination file first if it already exists, which can prevent issues (like an EBUSY error) when replacing a running executable.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i kept the operation as is before chewi made the change to avoid regression, not sure if it was better or worse but just guarenteed to work and no regression.

https://github.com/Azure/AgentBaker/pull/7125/changes#diff-ff0e92780b2c7f35348b62de54b815b2c9919cfd4f6612f43808aace9dc0a134R638

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regression? My change was merged two months ago. There are important reasons to use install over cp, including the one stated above. There are cases where the destination will be an existing symlink, and it is crucial that we replace the symlink, not its target. mv will do that, but I can't remember if there was some other reason why I didn't stick with mv.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR aims to reduce Linux CSE bootstrap critical-path latency by deferring non-critical steps until after ensureKubelet, avoiding redundant work (targeted sysctl reload, moving kube binaries), and adjusting VHD build/runtime behaviors around containerd.

Changes:

  • Reorders CSE provisioning steps so kubelet starts earlier; starts kubelet before measure-tls-bootstrapping-latency.service.
  • Optimizes provisioning work (targeted sysctl -p, mv+chmod for kube binaries, skip runtime install when golden image already contains it).
  • Adjusts VHD build scripts/tests to ensure containerd is started when needed and disabled during image cleanup; regenerates pkg/agent/testdata.

Reviewed changes

Copilot reviewed 18 out of 77 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
vhdbuilder/packer/trivy-scan.sh Sources provision helpers and ensures containerd is started before Trivy operations.
vhdbuilder/packer/test/linux-vhd-content-test.sh Starts containerd before executing VHD validation tests.
vhdbuilder/packer/cleanup-vhd.sh Disables containerd during VHD cleanup.
parts/linux/cloud-init/artifacts/cse_main.sh Defers non-critical steps until after ensureKubelet; skips container runtime install on golden images.
parts/linux/cloud-init/artifacts/cse_install.sh Uses mv + chmod when activating downloaded kubelet/kubectl.
parts/linux/cloud-init/artifacts/cse_config.sh Uses targeted sysctl -p and starts kubelet before the TLS bootstrapping latency measurement service.
pkg/agent/testdata/MarinerV2+Kata/CustomData Regenerated snapshot output for MarinerV2+Kata CustomData.
pkg/agent/testdata/CustomizedImage/CustomData Regenerated snapshot output for CustomizedImage CustomData.

You can also share your feedback on Copilot code review. Take the survey.

Arch: "arm64",
Distro: datamodel.AKSAzureLinuxV3Arm64Gen2,
Gallery: imageGalleryLinux,
Flatcar: true,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why flatcar?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ugh! copy paste error.


func Test_AzureLinuxV3_ARM64(t *testing.T) {
RunScenario(t, &Scenario{
Description: "Tests that a node using a Flatcar VHD on ARM64 architecture can be properly bootstrapped",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldn't this be Tests that a node using an AzureLinuxV3 VHD on ARM64 architecture can be properly bootstrapped


logs_to_events "AKS.CSE.ensureKubelet" ensureKubelet

if [ "${ENSURE_NO_DUPE_PROMISCUOUS_BRIDGE}" = "true" ]; then
Copy link
Contributor

@cameronmeissner cameronmeissner Mar 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so, just note that these changes aren't improving CSE latency right, they're just improving node registration latency (which is good don't get me wrong, though from the RP side, the operation should end up taking the same amount of time since RP will block synchronously on the CRP call to create / update the VM/VMSS, which is solely dependent on CSE execution time, not node registration time)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That would be half true, since due to reordering some startup/components, the dependent components start faster hence a faster overall CSE finish. But at end of the day we are still at mercy of CSE.

But you are right, we are purely focussed on Node registration speed up

@cameronmeissner
Copy link
Contributor

avoid redundant work during provisioning by ... and skipping container runtime install when the golden image already has it doesn't our container runtime installation logic already no-op when the desired containerd version is cached on the VHD? does this mean that our existing "cache checking" logic is incorrect?

fi
fi
install -m0755 "/opt/bin/kubelet-${KUBERNETES_VERSION}" /opt/bin/kubelet
install -m0755 "/opt/bin/kubectl-${KUBERNETES_VERSION}" /opt/bin/kubectl
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, wondering why install would be taking longer, if the correct version of kubelet/kubectl is cached on the VHD it should just move it to /opt/bin, no?

fi

# start measure-tls-bootstrapping-latency.service without waiting for the main process to start, while ignoring any failures
if ! systemctlEnableAndStartNoBlock measure-tls-bootstrapping-latency 30; then
Copy link
Contributor

@cameronmeissner cameronmeissner Mar 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this really adding a lot of latency? systemctlEnableAndStartNoBlock explicitly does not block, meaning systemd doesn't wait for the unit to enter a running state before returning

mainly asking since moving this below ensureKubelet seems to add a fair bit of complexity

@cameronmeissner
Copy link
Contributor

cameronmeissner commented Mar 17, 2026

disable containerd during VHD cleanup so the image does not carry it enabled prematurely - containerd needs to be running before we start kubelet - I'd think enabling containerd during the build would actually improve CSE latency, since we wouldn't need to start it before starting kubelet during provisioning

@cameronmeissner
Copy link
Contributor

#8105 (comment)

@awesomenix
Copy link
Contributor Author

disable containerd during VHD cleanup so the image does not carry it enabled prematurely - containerd needs to be running before we start kubelet - I'd think enabling containerd during the build would actually improve CSE latency, since we wouldn't need to start it before starting kubelet during provisioning

Not true, we actually restart during our CSE provisioning after we drop the config, so instead of performing double restarts and possibly slower since a restart is slower than a start.

systemctlEnableAndStart containerd 30 || exit $ERR_SYSTEMCTL_START_FAIL

Copilot AI review requested due to automatic review settings March 17, 2026 23:01
@awesomenix awesomenix force-pushed the nishp/tinyimprovements branch from 2fc37be to d8a4b88 Compare March 17, 2026 23:01
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR aims to reduce Linux node provisioning (CSE) bootstrap latency by moving non-critical work off the kubelet startup critical path, and by trimming some redundant/expensive operations during provisioning and VHD validation.

Changes:

  • Reorders portions of the Linux CSE flow so kubelet starts earlier, and updates TLS bootstrapping latency measurement to use a start-time file written immediately before kubelet startup.
  • Optimizes certain provisioning steps (targeted sysctl -p, mv+chmod for kube binaries) and adjusts VHD/Packer scripts to manage containerd state for scanning/tests/cleanup.
  • Adds an Azure Linux V3 Gen2 ARM64 E2E image definition and scenario, and regenerates pkg/agent/testdata outputs.

Reviewed changes

Copilot reviewed 20 out of 82 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
vhdbuilder/packer/trivy-scan.sh Sources provisioning helpers and explicitly enables/starts containerd before running Trivy scans.
vhdbuilder/packer/test/linux-vhd-content-test.sh Reloads systemd and restarts containerd at test start.
vhdbuilder/packer/cleanup-vhd.sh Disables containerd during VHD cleanup to avoid shipping it enabled.
spec/parts/linux/cloud-init/artifacts/measure_tls_bootstrapping_latency_spec.sh Updates ShellSpec coverage for TLS bootstrapping latency measurement behavior and the new start-time file logic.
pkg/agent/testdata/CustomizedImage/CustomData Regenerated snapshot testdata for updated custom data/CSE outputs.
parts/linux/cloud-init/artifacts/measure-tls-bootstrapping-latency.sh Adds configurable start-time filepath and emits completion events based on a pre-kubelet start timestamp.
parts/linux/cloud-init/artifacts/cse_start.sh Adds a ScriptlessMode datapoint to the guest agent event message payload.
parts/linux/cloud-init/artifacts/cse_main.sh Defers non-critical steps until after ensureKubelet; skips container runtime install when full install isn’t required.
parts/linux/cloud-init/artifacts/cse_install.sh Uses mv + chmod when activating downloaded kubelet/kubectl binaries.
parts/linux/cloud-init/artifacts/cse_config.sh Uses targeted sysctl -p for bridge-forwarding config; writes TLS bootstrapping start time before starting kubelet; starts the measurement service after kubelet.
e2e/scenario_test.go Adds an Azure Linux V3 Gen2 ARM64 E2E scenario.
e2e/config/vhd.go Adds an Azure Linux V3 Gen2 ARM64 image definition for E2E.

You can also share your feedback on Copilot code review. Take the survey.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR reduces Linux CSE bootstrap critical-path work by moving non-essential steps later, adds more precise TLS bootstrapping latency measurement, and updates test artifacts/config to reflect the new behavior.

Changes:

  • Defer non-critical CSE steps until after ensureKubelet and adjust containerd/sysctl handling to reduce redundant work.
  • Start kubelet before the TLS-bootstrapping-latency measurement service, using a start-time file to preserve the latency signal.
  • Add/adjust tests and testdata, including a new AzureLinux V3 ARM64 e2e scenario and regenerated pkg/agent/testdata.

Reviewed changes

Copilot reviewed 21 out of 83 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
vhdbuilder/packer/trivy-scan.sh Source provisioning helpers and ensure containerd is started before scanning.
vhdbuilder/packer/test/linux-vhd-content-test.sh Restart containerd before running VHD content tests.
vhdbuilder/packer/cleanup-vhd.sh Disable containerd during VHD cleanup.
spec/parts/linux/cloud-init/artifacts/measure_tls_bootstrapping_latency_spec.sh Update ShellSpec coverage for new TLS bootstrapping start-time behavior and race handling.
pkg/agent/testdata/CustomizedImage/CustomData Regenerated snapshot testdata reflecting updated custom data/CSE output.
parts/linux/cloud-init/artifacts/measure-tls-bootstrapping-latency.sh Use start-time file, emit completion event even for fast/racy kubeconfig creation, and improve quoting.
parts/linux/cloud-init/artifacts/kubelet.service Add pre-start wait for containerd socket.
parts/linux/cloud-init/artifacts/cse_start.sh Emit scriptless-mode datapoint in the guest agent event message.
parts/linux/cloud-init/artifacts/cse_main.sh Defer non-critical steps until after kubelet startup; skip container runtime install on golden images.
parts/linux/cloud-init/artifacts/cse_install.sh Activate kube binaries via mv + chmod rather than install.
parts/linux/cloud-init/artifacts/cse_config.sh Use targeted sysctl -p, start containerd non-blocking, and write TLS bootstrap start-time before starting kubelet.
e2e/scenario_test.go Add AzureLinux V3 ARM64 scenario.
e2e/config/vhd.go Add e2e image config for AzureLinux V3 ARM64 Gen2.

You can also share your feedback on Copilot code review. Take the survey.

ExecStartPre=-/sbin/iptables -t nat --numeric --list

ExecStartPre=/bin/bash /opt/azure/containers/validate-kubelet-credentials.sh
ExecStartPre=/bin/sh -c 'until [ -S /run/containerd/containerd.sock ]; do sleep 0.1; done'
mv "/opt/bin/kubelet-${KUBERNETES_VERSION}" /opt/bin/kubelet
mv "/opt/bin/kubectl-${KUBERNETES_VERSION}" /opt/bin/kubectl

chmod a+x /opt/bin/kubelet /opt/bin/kubectl
Comment on lines +22 to +23
systemctl daemon-reload && systemctl restart containerd


func Test_AzureLinuxV3_ARM64(t *testing.T) {
RunScenario(t, &Scenario{
Description: "Tests that a node using a AzureLinuxV3 VHD on ARM64 architecture can be properly bootstrapped",
Copilot AI review requested due to automatic review settings March 20, 2026 23:10
@awesomenix awesomenix force-pushed the nishp/tinyimprovements branch from 9c85db1 to 293a28c Compare March 20, 2026 23:10
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR aims to reduce node bootstrap latency by moving non-critical CSE work off the kubelet critical path, tightening a few runtime operations (sysctl reload, kube binary activation), and updating test/snapshot outputs to reflect the new generated artifacts.

Changes:

  • Reorders Linux CSE steps so ensureKubelet happens earlier; adds TLS bootstrapping latency “start time” handoff so measurement can run after kubelet starts.
  • Adjusts containerd/systemd handling across CSE + VHD build/test scripts (targeted sysctl reload, containerd enable/start ordering, VHD cleanup disabling containerd).
  • Adds an ARM64 AzureLinuxV3 e2e scenario and updates generated pkg/agent/testdata.

Reviewed changes

Copilot reviewed 22 out of 84 changed files in this pull request and generated 6 comments.

Show a summary per file
File Description
vhdbuilder/packer/trivy-scan.sh Sources provision helpers and explicitly starts containerd before running trivy scans.
vhdbuilder/packer/test/linux-vhd-content-test.sh Restarts containerd prior to running VHD content validations.
vhdbuilder/packer/cleanup-vhd.sh Disables containerd during VHD cleanup so captured images don’t ship with it enabled.
spec/parts/linux/cloud-init/artifacts/measure_tls_bootstrapping_latency_spec.sh Updates ShellSpec tests to reflect new “start time file” behavior and additional event emission cases.
parts/linux/cloud-init/artifacts/measure-tls-bootstrapping-latency.sh Adds start-time-file gating and emits completion events even for fast kubelet startups/races.
parts/linux/cloud-init/artifacts/kubelet.service Adds a pre-start wait for containerd socket presence.
parts/linux/cloud-init/artifacts/cse_start.sh Adds “scriptless mode” signal into guest agent event message payload.
parts/linux/cloud-init/artifacts/cse_main.sh Defers bridge/localdns/GPU cleanup work until after ensureKubelet.
parts/linux/cloud-init/artifacts/cse_install.sh Activates kube binaries via mv + chmod; ensures containerd is ready before China retagging.
parts/linux/cloud-init/artifacts/cse_config.sh Adds waitForContainerdReady; switches to targeted sysctl -p; writes TLS start time before kubelet start; starts measure service after kubelet.
pkg/agent/testdata/Flatcar/CustomData Regenerated snapshot output.
pkg/agent/testdata/ACL/CustomData Regenerated snapshot output.
e2e/scenario_test.go Adds AzureLinuxV3 ARM64 scenario; minor text updates.
e2e/scenario_gpu_managed_experience_test.go Adds GPU tag to dcgm-exporter compatibility scenario.
e2e/config/vhd.go Adds AzureLinuxV3 Gen2 ARM64 VHD entry.

Comment on lines +734 to 735
allMCRImages=($(ctr --namespace k8s.io images list | grep '^mcr.microsoft.com/' | awk '{print $1}'))
if [ -z "${allMCRImages}" ]; then
Copy link

Copilot AI Mar 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

allMCRImages is no longer declared local inside retagMCRImagesForChina, so it becomes a global array and can unintentionally leak/override state used elsewhere in the script. Make this a local variable (e.g., local -a allMCRImages=(...)) to avoid side effects between functions.

Suggested change
allMCRImages=($(ctr --namespace k8s.io images list | grep '^mcr.microsoft.com/' | awk '{print $1}'))
if [ -z "${allMCRImages}" ]; then
local -a allMCRImages=($(ctr --namespace k8s.io images list | grep '^mcr.microsoft.com/' | awk '{print $1}'))
if [ ${#allMCRImages[@]} -eq 0 ]; then

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants