Skip to content

feat: optimize gpu installation to < 10s#8450

Draft
awesomenix wants to merge 1 commit intomainfrom
nishp/optimize/gpu
Draft

feat: optimize gpu installation to < 10s#8450
awesomenix wants to merge 1 commit intomainfrom
nishp/optimize/gpu

Conversation

@awesomenix
Copy link
Copy Markdown
Contributor

as the title sasys

Copilot AI review requested due to automatic review settings May 4, 2026 03:44
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR aims to speed up Ubuntu GPU provisioning by avoiding the existing container-image-based NVIDIA CUDA driver install path and instead baking a precompiled local GPU package into the VHD, then using that package during CSE on Ubuntu 22.04 amd64 nodes. It touches both Linux VHD build-time logic and Ubuntu provisioning-time GPU setup, plus related e2e diagnostics and scenario configuration.

Changes:

  • Adds a local GPU tarball download/extract/compile flow to the Linux VHD builder for Ubuntu 22.04 amd64.
  • Updates Ubuntu CSE GPU installation to prefer /opt/gpu/install_package.sh, with asynchronous depmod and extra diagnostics/log collection.
  • Adjusts Ubuntu GPU e2e coverage to force classic CSE execution and collect new NVIDIA installer logs.

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
vhdbuilder/packer/install-dependencies.sh Adds build-time download and preparation of a local Ubuntu GPU package instead of always pre-pulling the GPU image.
parts/linux/cloud-init/artifacts/ubuntu/cse_install_ubuntu.sh Adds an extra Ubuntu dependency intended to support the new local GPU install path.
parts/linux/cloud-init/artifacts/cse_main.sh Moves depmod off the critical path by running it in the background and waiting later.
parts/linux/cloud-init/artifacts/cse_config.sh Switches Ubuntu 22.04 amd64 GPU installation to prefer a local /opt/gpu installer over the container-based flow.
e2e/vmss.go Collects additional NVIDIA installer and strace logs from Linux VMs.
e2e/scenario_test.go Alters the Ubuntu 22.04 GPU scenario to force classic CSE and skip default validation.

FLATCAR_OS_NAME="FLATCAR"
ACL_OS_NAME="AZURECONTAINERLINUX"

curl -fSL "https://abe2etestwestus3.blob.core.windows.net/aksgpu/aks-gpu-cuda-580.126.09-ubuntu-22.04-amd64.tar.gz?sp=r&st=2026-05-03T15:11:28Z&se=2026-05-03T23:26:28Z&skoid=62dcaf3d-7760-493c-bf35-5d8a776d946f&sktid=72f988bf-86f1-41af-91ab-2d7cd011db47&skt=2026-05-03T15:11:28Z&ske=2026-05-03T23:26:28Z&sks=b&skv=2025-11-05&spr=https&sv=2025-11-05&sr=b&sig=uX6177lEGwI5ZSeUHpRtOUO9ZpXnLrpiDh4bXX3yjHk%3D" -o /home/packer/aks-gpu-cuda-amd64.tar.gz
FLATCAR_OS_NAME="FLATCAR"
ACL_OS_NAME="AZURECONTAINERLINUX"

curl -fSL "https://abe2etestwestus3.blob.core.windows.net/aksgpu/aks-gpu-cuda-580.126.09-ubuntu-22.04-amd64.tar.gz?sp=r&st=2026-05-03T15:11:28Z&se=2026-05-03T23:26:28Z&skoid=62dcaf3d-7760-493c-bf35-5d8a776d946f&sktid=72f988bf-86f1-41af-91ab-2d7cd011db47&skt=2026-05-03T15:11:28Z&ske=2026-05-03T23:26:28Z&sks=b&skv=2025-11-05&spr=https&sv=2025-11-05&sr=b&sig=uX6177lEGwI5ZSeUHpRtOUO9ZpXnLrpiDh4bXX3yjHk%3D" -o /home/packer/aks-gpu-cuda-amd64.tar.gz
apt_get_update || exit $ERR_APT_UPDATE_TIMEOUT

pkg_list=(apparmor-utils bind9-dnsutils ca-certificates ceph-common cgroup-lite cifs-utils conntrack cracklib-runtime ebtables ethtool glusterfs-client htop init-system-helpers inotify-tools iotop iproute2 ipset iptables nftables jq libpam-pwquality libpwquality-tools mount nfs-common pigz socat sysfsutils sysstat util-linux xz-utils netcat-openbsd zip rng-tools kmod gcc make dkms initramfs-tools linux-headers-$(uname -r) linux-modules-extra-$(uname -r))
pkg_list=(apparmor-utils bind9-dnsutils ca-certificates ceph-common cgroup-lite cifs-utils conntrack cracklib-runtime ebtables ethtool glusterfs-client htop init-system-helpers inotify-tools iotop iproute2 ipset iptables nftables jq libpam-pwquality libpwquality-tools mount nfs-common pigz socat sysfsutils sysstat util-linux xz-utils netcat-openbsd zip rng-tools kmod gcc make dkms initramfs-tools libc6-dev linux-headers-$(uname -r) linux-modules-extra-$(uname -r))
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we need to move the required packages for building the gpu dirvers to a different GPU bootstrapping DevEnv and remove them after building

Comment on lines +953 to +958
if [ "${UBUNTU_RELEASE}" = "22.04" ] && [ "$(getCPUArch)" = "amd64" ] && [ -f /opt/gpu/install_package.sh ]; then
echo "Using local Ubuntu 22.04 amd64 GPU driver package from /opt/gpu"
sed -i 's#\./nvidia-installer -s #strace -tt -f -o /var/log/nvidia-installer.strace ./nvidia-installer -s #' /opt/gpu/install_package.sh
retrycmd_if_failure 5 10 600 bash -c "cd /opt/gpu && bash ./install_package.sh"
ret=$?
else
Comment thread e2e/scenario_test.go
Cluster: ClusterKubenet,
VHD: config.VHDUbuntu2204Gen2Containerd,
SkipScriptlessNBC: true,
SkipDefaultValidation: true,
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants