Skip to content

vminit: switch root to new tmpfs before serving containers#187

Open
dmcgowan wants to merge 1 commit into
containerd:mainfrom
dmcgowan:vminitd-switch-root
Open

vminit: switch root to new tmpfs before serving containers#187
dmcgowan wants to merge 1 commit into
containerd:mainfrom
dmcgowan:vminitd-switch-root

Conversation

@dmcgowan
Copy link
Copy Markdown
Member

Replace the static initramfs root with a fresh tmpfs using MS_MOVE + chroot (the classic switch_root approach, compatible with any kernel version).

  • switchRoot mounts a new tmpfs, copies all pseudo-filesystems (proc, sysfs, cgroup2, devtmpfs, /run) directly into it, then replaces / via MS_MOVE + chroot. No intermediate mount-moves are needed since the pseudo-filesystems are mounted into the new root from the start.
  • /run tmpfs is capped at size=64m,mode=0755; root tmpfs at size=128m,mode=0755.
  • /tmp is not mounted — nothing in vminitd uses it; containers get their own per-container tmpfs from the OCI spec.
  • /etc is created inside switchRoot alongside the other directories.
  • crun is copied into the new root then removed from the initramfs to reclaim its pages before the root switch.
  • NoPivot is changed from a hardcoded true to p.NoPivotRoot (defaults false), enabling pivot_root inside containers now that the VM root is a distinct tmpfs filesystem separate from the container overlay/erofs rootfs.

Copilot AI review requested due to automatic review settings May 12, 2026 05:42
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR changes vminitd’s early-boot behavior to replace the initramfs-backed / with a fresh tmpfs “real root” (switch_root via MS_MOVE + chroot) before container services start, enabling a distinct VM root filesystem and allowing container pivot_root behavior to be configurable again.

Changes:

  • Add switchRoot() to build a new tmpfs root, mount pseudo-filesystems into it, copy crun, configure cgroup v2 controllers, and switch / via MS_MOVE + chroot.
  • Refactor systemInit to perform the root switch first, then networking setup, and manage the DHCP renewer goroutine internally.
  • Change runc create options to set NoPivot from p.NoPivotRoot instead of being hardcoded to true.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

File Description
pkg/vminit/initd/initd.go Implements tmpfs switch_root flow, updates pseudo-filesystem mounts, and refactors system init/networking startup.
internal/vminit/process/init.go Makes container pivot_root behavior configurable via p.NoPivotRoot.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread pkg/vminit/initd/initd.go Outdated
Comment on lines +234 to +241
// avoids the stat walk that os.MkdirAll does per path. /sys/fs and
// /sys/fs/cgroup are created explicitly because sysfs is not yet mounted.
for _, dir := range []string{
root + "/sbin",
root + "/proc",
root + "/sys",
root + "/sys/fs",
root + "/sys/fs/cgroup",
Replace the static initramfs root with a fresh tmpfs using MS_MOVE + chroot
(the classic switch_root approach, compatible with any kernel version).

- switchRoot mounts a new tmpfs, copies all pseudo-filesystems (proc, sysfs,
  cgroup2, devtmpfs, /run) directly into it, then replaces / via MS_MOVE +
  chroot. No intermediate mount-moves are needed since the pseudo-filesystems
  are mounted into the new root from the start.
- /run tmpfs is capped at size=64m,mode=0755; root tmpfs at size=128m,mode=0755.
- /tmp is not mounted — nothing in vminitd uses it; containers get their own
  per-container tmpfs from the OCI spec.
- /etc is created inside switchRoot alongside the other directories.
- crun is copied into the new root then removed from the initramfs to reclaim
  its pages before the root switch.
- NoPivot is changed from a hardcoded true to p.NoPivotRoot (defaults false),
  enabling pivot_root inside containers now that the VM root is a distinct
  tmpfs filesystem separate from the container overlay/erofs rootfs.

Signed-off-by: Derek McGowan <derek@mcg.dev>
@dmcgowan dmcgowan force-pushed the vminitd-switch-root branch from 12678b0 to 4e57ad5 Compare May 12, 2026 06:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants