Skip to content

mount: call mount(2)/umount(2) directly when euid==0#10

Open
Till0196 wants to merge 1 commit into
nestybox:masterfrom
Till0196:feat/direct-mount-no-fusermount
Open

mount: call mount(2)/umount(2) directly when euid==0#10
Till0196 wants to merge 1 commit into
nestybox:masterfrom
Till0196:feat/direct-mount-no-fusermount

Conversation

@Till0196
Copy link
Copy Markdown

fusermount3 exists to let non-root callers mount FUSE filesystems. When the caller is already root with CAP_SYS_ADMIN -- which is exactly the case for sysbox-fs running as a systemd-managed daemon -- every check fusermount3 does is a no-op (libfuse util/fusermount.c:1163, if (getuid() == 0) return 0;), and we still pay a fork+exec plus an AF_UNIX SCM_RIGHTS round-trip per mount, just to end up at the same mount(2) we could have called directly.

It also forces a runtime dependency on the fusermount3 binary, which is the main reason sysbox-fs has to ship and install the helper on every host (read-only /usr on Flatcar, distroless images, etc.).

This patch adds an early return at the top of mount() and unmount():

if os.Geteuid() == 0 {
    return directMount(dir, conf)
}

directMount opens /dev/fuse, stats the target for rootmode, and calls mount(2) with the kernel options fusermount3 would have produced: fd, rootmode, user_id, group_id, plus an allowlist of allow_other / default_permissions / max_read / blksize. Flags are MS_NOSUID | MS_NODEV. source is the fsname option, type is "fuse". This mirrors libfuse's prepare_mount() in util/fusermount.c.

unmount on the root path becomes syscall.Unmount(dir, 0). That matches libfuse's umount2(mnt, 0) branch in lib/mount_util.c:307, which is what fusermount3 already takes whenever /etc/mtab is a symlink -- i.e. every modern systemd system.

The non-root mount() and the fusermount3-based unmount() fallback are byte-for-byte unchanged. Diff is +68 lines, no deletions.

Audit against libfuse:

libfuse fusermount3 this patch (root path) source
flags MS_NOSUID | MS_NODEV same fusermount.c:999
type "fuse" same
source fsname option first same mount_util.c:544
data (required) fd, rootmode, user_id, group_id same fusermount.c:1056
data (kernel-OK opts) default_permissions, allow_other, max_read=, blksize= same allowlist fusermount.c:978
userspace-only opts (fsname, subtype) not in data string not in data string
/dev/fuse open O_RDWR O_RDWR (via os.OpenFile) fusermount.c:1300

Known gaps:

  • Subtype() is not threaded into the mount(2) type argument. No bazil user that motivated this change sets Subtype; if needed later, building "fuse."+subtype is trivial.
  • The direct path uses syscall.Stat() which follows symlinks. fusermount3 has TOCTOU defenses at util/fusermount.c:341-414, but those only apply to non-root callers (fusermount3 itself skips them at fusermount.c:1163), so there is no regression vs. the helper path.
  • mtab/utab is not updated on the direct path. On modern systems /etc/mtab is a symlink to /proc/self/mounts and fuse_mnt_add_mount is already a no-op.

Tested on Flatcar Container Linux 4593.2.1 (kernel 6.12.87), RKE2 v1.36.0+rke2r1, containerd 2.2.3-k3s1, Cilium, with no fusermount3 binary present on the host:

  • systemctl start sysbox-fs with fusermount3 absent: active, no errors.
  • alpine 3.20 pod, runtimeClassName: sysbox-runc, hostUsers: false: runs; all six sysboxfs FUSE mounts established and serving reads (/proc/swaps, /proc/sys, /proc/uptime, /sys/devices/virtual, /sys/kernel, /sys/module/nf_conntrack/parameters).
  • mount(2) options observed from inside the pod: rw,nosuid,nodev,relatime,user_id=0,group_id=0,default_permissions,allow_other.
  • umount on pod teardown: clean, no leaked mounts.

Non-root path is not exercised here because sysbox-fs is always root by design, but the code is unchanged.

This is the prerequisite for dropping the fusermount3 runtime dependency in sysbox-pkgr (Flatcar and distroless artifacts) and for the install_sysbox_deps_flatcar() simplification in nestybox/sysbox#995.

bazil/fuse#195 (open since 2018-02) is an earlier attempt at the same thing. It hardcodes rootmode=40000, never forwards kernel options, and doesn't touch unmount. The discussion there stalled in 2020 on the maintainer's broader objection that "you shouldn't be running as root in the first place" -- Go can't drop privileges cleanly and the maintainer wants a complete story before adding the path. That story is irrelevant to nestybox/fuse: sysbox-fs needs CAP_SYS_ADMIN for its core responsibilities, not just for mounting, so it is privileged for its entire lifetime and there is no privilege to drop.

Copilot AI review requested due to automatic review settings May 19, 2026 09:18
fusermount3 exists to let non-root callers mount FUSE filesystems.
When the caller is already root with CAP_SYS_ADMIN -- which is exactly
the case for sysbox-fs running as a systemd-managed daemon -- every
check fusermount3 does is a no-op (libfuse util/fusermount.c:1163,
"if (getuid() == 0) return 0;"), and we still pay a fork+exec plus an
AF_UNIX SCM_RIGHTS round-trip per mount.

It also forces a runtime dependency on the fusermount3 binary, which
is the main reason sysbox-fs has to ship and install the helper on
every host (read-only /usr on Flatcar, distroless images, etc.).

Add an early return at the top of mount() and unmount() that takes a
direct path when running as root: open /dev/fuse, stat the target for
rootmode, call mount(2) with the kernel option set fusermount3 would
have produced (fd, rootmode, user_id, group_id, plus an allowlist of
allow_other / default_permissions / max_read / blksize). Flags are
MS_NOSUID | MS_NODEV. source is the fsname option, type is "fuse".
unmount on the root path becomes syscall.Unmount(dir, 0).

The non-root mount() and the fusermount3-based unmount() fallback are
byte-for-byte unchanged.

Cross-checked against libfuse util/fusermount.c (prepare_mount) and
lib/mount_util.c (fuse_mnt_umount): mount flags, type, source,
required data, kernel-OK opts, and /dev/fuse open mode all match.

Tested on Flatcar Container Linux 4593.2.1 + kernel 6.12.87 + RKE2
v1.36.0+rke2r1 + containerd 2.2.3-k3s1, with no fusermount3 binary
present on the host: sysbox-fs starts clean, all six sysboxfs FUSE
mounts are established and serve reads, observed mount options match
expectation
(rw,nosuid,nodev,relatime,user_id=0,group_id=0,default_permissions,allow_other),
unmount on pod teardown leaves no leaked mounts.
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds a root fast-path for Linux FUSE mounting/unmounting that bypasses fusermount3 and calls mount(2) / umount(2) directly to avoid fork/exec overhead and the runtime dependency on the fusermount3 binary for privileged daemons.

Changes:

  • Add an os.Geteuid()==0 fast-path in mount() to call a new directMount() helper that opens /dev/fuse and invokes syscall.Mount.
  • Add an os.Geteuid()==0 fast-path in unmount() to call syscall.Unmount(dir, 0) directly.
  • Introduce a kernel option allowlist to limit which -o options are passed to the kernel via mount(2).

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.

File Description
mount_linux.go Adds root fast-path and implements directMount() with a kernel option allowlist.
unmount_linux.go Adds root fast-path to unmount via syscall.Unmount instead of fusermount3.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread mount_linux.go
source = "fuse"
}

flags := uintptr(syscall.MS_NOSUID | syscall.MS_NODEV)
Comment thread mount_linux.go
Comment on lines +211 to +212
flags := uintptr(syscall.MS_NOSUID | syscall.MS_NODEV)
if err := syscall.Mount(source, dir, "fuse", flags, opts); err != nil {
Comment thread unmount_linux.go
Comment on lines 11 to +14
func unmount(dir string) error {
if os.Geteuid() == 0 {
return syscall.Unmount(dir, 0)
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants