mount: call mount(2)/umount(2) directly when euid==0#10
Open
Till0196 wants to merge 1 commit into
Open
Conversation
fusermount3 exists to let non-root callers mount FUSE filesystems. When the caller is already root with CAP_SYS_ADMIN -- which is exactly the case for sysbox-fs running as a systemd-managed daemon -- every check fusermount3 does is a no-op (libfuse util/fusermount.c:1163, "if (getuid() == 0) return 0;"), and we still pay a fork+exec plus an AF_UNIX SCM_RIGHTS round-trip per mount. It also forces a runtime dependency on the fusermount3 binary, which is the main reason sysbox-fs has to ship and install the helper on every host (read-only /usr on Flatcar, distroless images, etc.). Add an early return at the top of mount() and unmount() that takes a direct path when running as root: open /dev/fuse, stat the target for rootmode, call mount(2) with the kernel option set fusermount3 would have produced (fd, rootmode, user_id, group_id, plus an allowlist of allow_other / default_permissions / max_read / blksize). Flags are MS_NOSUID | MS_NODEV. source is the fsname option, type is "fuse". unmount on the root path becomes syscall.Unmount(dir, 0). The non-root mount() and the fusermount3-based unmount() fallback are byte-for-byte unchanged. Cross-checked against libfuse util/fusermount.c (prepare_mount) and lib/mount_util.c (fuse_mnt_umount): mount flags, type, source, required data, kernel-OK opts, and /dev/fuse open mode all match. Tested on Flatcar Container Linux 4593.2.1 + kernel 6.12.87 + RKE2 v1.36.0+rke2r1 + containerd 2.2.3-k3s1, with no fusermount3 binary present on the host: sysbox-fs starts clean, all six sysboxfs FUSE mounts are established and serve reads, observed mount options match expectation (rw,nosuid,nodev,relatime,user_id=0,group_id=0,default_permissions,allow_other), unmount on pod teardown leaves no leaked mounts.
There was a problem hiding this comment.
Pull request overview
This PR adds a root fast-path for Linux FUSE mounting/unmounting that bypasses fusermount3 and calls mount(2) / umount(2) directly to avoid fork/exec overhead and the runtime dependency on the fusermount3 binary for privileged daemons.
Changes:
- Add an
os.Geteuid()==0fast-path inmount()to call a newdirectMount()helper that opens/dev/fuseand invokessyscall.Mount. - Add an
os.Geteuid()==0fast-path inunmount()to callsyscall.Unmount(dir, 0)directly. - Introduce a kernel option allowlist to limit which
-ooptions are passed to the kernel viamount(2).
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
| mount_linux.go | Adds root fast-path and implements directMount() with a kernel option allowlist. |
| unmount_linux.go | Adds root fast-path to unmount via syscall.Unmount instead of fusermount3. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| source = "fuse" | ||
| } | ||
|
|
||
| flags := uintptr(syscall.MS_NOSUID | syscall.MS_NODEV) |
Comment on lines
+211
to
+212
| flags := uintptr(syscall.MS_NOSUID | syscall.MS_NODEV) | ||
| if err := syscall.Mount(source, dir, "fuse", flags, opts); err != nil { |
Comment on lines
11
to
+14
| func unmount(dir string) error { | ||
| if os.Geteuid() == 0 { | ||
| return syscall.Unmount(dir, 0) | ||
| } |
d1cab16 to
56767d3
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
fusermount3 exists to let non-root callers mount FUSE filesystems. When the caller is already root with CAP_SYS_ADMIN -- which is exactly the case for sysbox-fs running as a systemd-managed daemon -- every check fusermount3 does is a no-op (libfuse
util/fusermount.c:1163,if (getuid() == 0) return 0;), and we still pay a fork+exec plus an AF_UNIX SCM_RIGHTS round-trip per mount, just to end up at the same mount(2) we could have called directly.It also forces a runtime dependency on the fusermount3 binary, which is the main reason sysbox-fs has to ship and install the helper on every host (read-only
/usron Flatcar, distroless images, etc.).This patch adds an early return at the top of
mount()andunmount():directMountopens/dev/fuse, stats the target forrootmode, and calls mount(2) with the kernel options fusermount3 would have produced:fd,rootmode,user_id,group_id, plus an allowlist ofallow_other/default_permissions/max_read/blksize. Flags areMS_NOSUID | MS_NODEV.sourceis the fsname option,typeis"fuse". This mirrors libfuse'sprepare_mount()inutil/fusermount.c.unmounton the root path becomessyscall.Unmount(dir, 0). That matches libfuse'sumount2(mnt, 0)branch inlib/mount_util.c:307, which is what fusermount3 already takes whenever/etc/mtabis a symlink -- i.e. every modern systemd system.The non-root
mount()and the fusermount3-basedunmount()fallback are byte-for-byte unchanged. Diff is +68 lines, no deletions.Audit against libfuse:
MS_NOSUID | MS_NODEVfusermount.c:999"fuse"mount_util.c:544fd, rootmode, user_id, group_idfusermount.c:1056default_permissions, allow_other, max_read=, blksize=fusermount.c:978fsname,subtype)/dev/fuseopenO_RDWRO_RDWR(viaos.OpenFile)fusermount.c:1300Known gaps:
Subtype()is not threaded into the mount(2)typeargument. No bazil user that motivated this change setsSubtype; if needed later, building"fuse."+subtypeis trivial.syscall.Stat()which follows symlinks. fusermount3 has TOCTOU defenses atutil/fusermount.c:341-414, but those only apply to non-root callers (fusermount3 itself skips them atfusermount.c:1163), so there is no regression vs. the helper path./etc/mtabis a symlink to/proc/self/mountsandfuse_mnt_add_mountis already a no-op.Tested on Flatcar Container Linux 4593.2.1 (kernel 6.12.87), RKE2 v1.36.0+rke2r1, containerd 2.2.3-k3s1, Cilium, with no
fusermount3binary present on the host:systemctl start sysbox-fswith fusermount3 absent: active, no errors.runtimeClassName: sysbox-runc,hostUsers: false: runs; all six sysboxfs FUSE mounts established and serving reads (/proc/swaps,/proc/sys,/proc/uptime,/sys/devices/virtual,/sys/kernel,/sys/module/nf_conntrack/parameters).rw,nosuid,nodev,relatime,user_id=0,group_id=0,default_permissions,allow_other.umounton pod teardown: clean, no leaked mounts.Non-root path is not exercised here because sysbox-fs is always root by design, but the code is unchanged.
This is the prerequisite for dropping the fusermount3 runtime dependency in sysbox-pkgr (Flatcar and distroless artifacts) and for the
install_sysbox_deps_flatcar()simplification in nestybox/sysbox#995.bazil/fuse#195 (open since 2018-02) is an earlier attempt at the same thing. It hardcodes
rootmode=40000, never forwards kernel options, and doesn't touch unmount. The discussion there stalled in 2020 on the maintainer's broader objection that "you shouldn't be running as root in the first place" -- Go can't drop privileges cleanly and the maintainer wants a complete story before adding the path. That story is irrelevant to nestybox/fuse: sysbox-fs needs CAP_SYS_ADMIN for its core responsibilities, not just for mounting, so it is privileged for its entire lifetime and there is no privilege to drop.