Skip to content

cgroup: add BPF probe for v1 fallback and improve GCS error diagnostic#2716

Open
jiechen0826 wants to merge 2 commits intomicrosoft:mainfrom
jiechen0826:cgroup-bpf-probe
Open

cgroup: add BPF probe for v1 fallback and improve GCS error diagnostic#2716
jiechen0826 wants to merge 2 commits intomicrosoft:mainfrom
jiechen0826:cgroup-bpf-probe

Conversation

@jiechen0826
Copy link
Copy Markdown
Contributor

@jiechen0826 jiechen0826 commented May 1, 2026

Problem

On Yocto kernels (e.g. 6.1.153.1-microsoft-standard), the kernel supports the cgroup2 filesystem but lacks CONFIG_CGROUP_BPF. When init mounts cgroup v2 and finds controllers, it commits to v2. Later, runc tries BPF_PROG_QUERY on BPF_CGROUP_DEVICE and gets ENOSYS, causing container creation to fail.

Separately, on ARM kernels with CONFIG_MEMCG_V1 disabled (e.g. 6.18+), the GCS crashes with a generic error when the v1 memory controller is absent, making it hard to diagnose.

On Ubuntu 6.17 kernels, the vsock read() in init_entropy() returns ENOMEM after hv_sock module load when transport buffers are not yet initialized. Since init runs as PID 1, the die() call triggers a kernel panic, preventing the UVM from booting.

Fix

init.c (commit 1): Before committing to cgroup v2, probe BPF cgroup support by issuing BPF_PROG_QUERY with attach_type = BPF_CGROUP_DEVICE on the cgroup root fd. If the kernel returns ENOSYS or EINVAL (no BPF support), unmount cgroup2 and fall back to v1. Also add a mounted counter in init_cgroups_v1() to die cleanly when no v1 controllers are available, and add hcsshim.cgroup=v1 kernel parameter to force v1.

main.go (commit 1): Improve the fatal error message when the v1 memory controller is absent to point at CONFIG_MEMCG_V1 and the cgroup v2 + CONFIG_CGROUP_BPF alternative.

init.c (commit 2): Make entropy seeding non-fatal by replacing die() with warn() + dmesgWarn() + early return/break. The UVM can function without entropy seeding.

Testing

  • Azure Linux kernel (6.14.0): BPF probe succeeds, cgroup v2 used, LCOW smoke test passes
  • Yocto kernel (6.1.153.1-microsoft-standard): init detects ENOSYS from BPF probe, falls back to v1, LCOW smoke test passes
  • Ubuntu kernel (6.17.0-1008-azure): BPF probe succeeds, cgroup v2 used, entropy warning logged but UVM boots, sandbox creation succeeds
  • Verified entropy fix is required: without it, Ubuntu 6.17 fails with failed to connect to entropy socket: context deadline exceeded

@jiechen0826 jiechen0826 requested a review from a team as a code owner May 1, 2026 05:33
@jiechen0826 jiechen0826 changed the title cgroup: add BPF probe for v1 fallback and fix eventfd double-close cgroup: add BPF probe for v1 fallback and improve GCS error diagnostic May 4, 2026
Comment thread init/init.c Outdated
Comment thread init/init.c Outdated
@jiechen0826 jiechen0826 requested a review from helsaawy May 5, 2026 16:50
init.c: Probe BPF_CGROUP_DEVICE before committing to cgroup v2. If the
kernel lacks CONFIG_CGROUP_BPF (e.g. Yocto 6.1), fall back to v1. Also
add has_v1_controllers() guard and hcsshim.cgroup=v1 kernel parameter.

cmd/gcs/main.go: Improve fatal error message when the v1 memory
controller is absent (CONFIG_MEMCG_V1 disabled, e.g. kernel 6.18+) to
point at the kernel config and the v2 + BPF alternative.

Signed-off-by: Jie Chen <jiechen3@microsoft.com>
On Ubuntu 6.17 kernels, the vsock read in init_entropy() returns ENOMEM
after hv_sock module load when transport buffers are not yet initialized.
Since init runs as PID 1, die() triggers a kernel panic.

Change all die() calls in init_entropy() to warn()+dmesgWarn()+return/break
so the UVM can boot without entropy seeding. Entropy improves randomness
quality but is not required for UVM operation.

Tested: Ubuntu 6.17.0-1008-azure kernel boots successfully with this fix.
Without it, sandbox creation fails with 'context deadline exceeded' as the
GCS never starts.

Signed-off-by: Jie Chen <jiechen3@microsoft.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants