fix(satellite): mount /run/udev so libzfs sees partition uevents (Bug 359)#12
Conversation
`linstor ps cdp zfs` returned SUCCESS but the resulting StoragePool
stayed at `State=Error` with `pool backing storage missing`. Reproduced
on e2e3 stand: satellite-side `zpool create` failed deterministically:
zpool create -f -O compression=off -O atime=off data /dev/sda:
cannot label 'sda': failed to detect device partitions on
'/dev/sda1': 19 (ENODEV)
Root cause: kubelet hands every privileged container its own private
devtmpfs instance for /dev. zpool create stamps the GPT on /dev/sda
(kernel creates sda1 + sda9 on the host's devtmpfs), then libzfs
immediately open()s /dev/sda1 to write the ZFS label — the inode is
not in the container's devtmpfs yet, open() returns ENODEV, the pool
is left half-stamped.
Bug 346 attempted `mountPropagation: HostToContainer` to slave-mirror
host /dev events into the container. That didn't help: rslave updates
mount events, not devtmpfs inode visibility for a freshly-mknod'd
partition node — and kubelet still allocated a separate devtmpfs.
Fix mirrors piraeus's satellite DaemonSet: declare the volume as a
plain `hostPath: {path: /dev, type: Directory}` and mount it without
mountPropagation. With `type: Directory` kubelet bind-mounts the
host's devtmpfs directory directly into the container — same inode
table, same partition nodes visible immediately after mknod, no
slave-mirror games. Verified against piraeus's working satellite on
the same Talos layout (dev5 cluster).
No Go-code changes are needed for the mount race; the satellite's
exec stays in the container. Bug 359 also surfaces a separate Talos
read-only-rootfs issue with `zpool create`'s implicit mkdir — fixed
in the follow-up commit (`-m none`).
Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Andrei Kvapil <kvapss@gmail.com>
Followup to the Bug 359 mount fix. `zpool create` tries to mkdir /<pool> as a mountpoint when the new pool is imported. On Talos the host rootfs is read-only outside of a small writable allowlist — mkdir fails with EROFS, `zpool create` returns non-zero, blockstor rolls back the SP CRD even though the pool is already on disk + imported. The next reconcile finds the existing pool and bails with EEXIST, leaving the SP perpetually missing. blockstor uses `zfs create -V` (zvol) datasets only — the root pool mountpoint is never load-bearing. `-m none` tells zpool not to allocate a mountpoint at all, sidestepping the EROFS without losing any function. Test prefix assertion in pkg/satellite/attach_test.go is unchanged (it pins `zpool create -f` only). Co-Authored-By: Claude <noreply@anthropic.com> Signed-off-by: Andrei Kvapil <kvapss@gmail.com>
|
Caution Review failedThe pull request is closed. ℹ️ Recent review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (2)
📝 WalkthroughWalkthroughThe PR updates the blockstor-satellite codebase to support ZFS pool creation on read-only Talos rootfs by suppressing automatic mountpoint creation in the ChangesZFS Pool Creation and Pod Configuration
Estimated code review effort🎯 2 (Simple) | ⏱️ ~12 minutes Poem
✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Code Review
This pull request addresses issues encountered during ZFS pool creation on Talos environments. It updates the zpool create command to include the -m none flag, preventing failures caused by attempts to create mount points on read-only root filesystems. Furthermore, the Kubernetes DaemonSet configuration is adjusted to use a plain bind mount for /dev, ensuring the container correctly inherits the host's devtmpfs and can access new device nodes generated during partition rescans. I have no feedback to provide as there were no review comments to evaluate.
Two further pieces of the piraeus satellite Pod spec that blockstor's satellite was missing — both relevant to the `ps cdp zfs` failure (Bug 359): 1. `hostIPC: true`. LVM userspace tooling and libzfs use host-wide SysV/POSIX IPC for whole-host coordination (lvmlockd handshakes, zfs.ko's libzpool ↔ /etc/libnvpair shared keys, etc.). Without hostIPC the satellite owns its own IPC namespace and can race or deadlock against host-side commands that assume the IPC is process-global. Mirrors the `linstor-satellite.nodeN` DaemonSet on the same Talos layout (cozy-linstor namespace). 2. `/run/udev` (ro, hostPath type=Directory). udev's runtime DB lives at /run/udev/data/b<MAJ>:<MIN>. libzfs/libblkid query it to look up partition metadata (PARTUUID, fs signatures, holders). Without this mount, the satellite sees an empty DB — partition rescan after `zpool create`'s GPT stamp returns nothing, libzfs treats it as "partition not present" and aborts (matches the ENODEV symptom). The previous /dev mount fix (commit `e57912c44`) made the partition node visible; this one makes the udev metadata about it visible too. Together these complete the parity with piraeus's satellite mount shape — the only remaining differences (var-lib-drbd, /etc/lvm breakout, capabilities allow-list) are not load-bearing for ZFS. Co-Authored-By: Claude <noreply@anthropic.com> Signed-off-by: Andrei Kvapil <kvapss@gmail.com>
Live diagnostic (empirical, 2026-05-22)Ran a side-by-side mechanical comparison of blockstor's vs piraeus's satellite on the same Talos worker ( Mountinfo comparison: /dev is identicalBoth pods see the host's devtmpfs ( Same major:minor Additional control test: in the failing blockstor satellite, Pod-spec diff (only load-bearing differences)
Reproducing the failureSame Errno strace nails the root cause
In the blockstor pod, In the piraeus pod, Validation:
|
Summary
Real root cause of Bug 359 (
linstor ps cdp zfsSUCCESS but SP stays atState=Errorwithpool backing storage missing) turned out to be the missing/run/udevmount in blockstor's satellite Pod. Validated empirically one2e3-worker-1(see the "Live diagnostic" comment for the side-by-side strace + mountinfo + udev DB capture).libzfs's
zpool_label_disk_wait()polls/run/udev/data/b<MAJ>:<MIN>to confirm the host's udev daemon finished processing the partition uevent. With no/run/udevmount in the container, libudev reports the partition as "not initialized" and libzfs times out →failed to detect device partitions on '/dev/sda1': 19→ SP rolled back. The/dev/sda1inode itself was visible all along (devtmpfs0:6is shared via the hostPath bind).PR #11's
nsenter"worked" for the same reason — it ranzpoolin PID 1's mount namespace which has/run/udevavailable. The piraeus DaemonSet ships/run/udevas a ro bind and gets the same outcome without nsenter.Commits
e57912c44—hostPath: {path: /dev, type: Directory}+ dropmountPropagation: HostToContainerNecessary but not sufficient. Without
type: Directorykubelet's hostPath validation is lenient and downstream behaviour gets fragile; without droppingmountPropagationwe still inherit the Bug 346 attempt that misdiagnosed this race. Mirrors piraeus's satellite verbatim.77235179d—zpool create -m noneIndependent Talos fix: after the pool is stamped + imported,
zpool createtriesmkdir /<pool>for the implicit mountpoint. Talos rootfs is RO outside a small allowlist → EROFS →zpool createexits non-zero → SP rolled back even though the pool exists on disk. blockstor useszfs create -Vzvols only, so the pool mountpoint is never load-bearing.925d3cd4a—/run/udevro mount +hostIPC: trueThe actual fix.
/run/udevmakes libzfs's libudev poll see the host udevd's partition metadata.hostIPC: truemirrors piraeus for LVM userland coordination (lvmlockd uses host-wide IPC).Why not nsenter?
PR #11 wrapped
zpool create/addinnsenter -t 1 -m --to hop into PID 1's mount namespace. That fixed the symptom but required:nsenterto the satellite imagerunHostZpool) + unit tests pinning the nsenter prefixThe piraeus approach is the same outcome with three YAML lines and no Go change. The right diagnosis is "libudev can't see host udev DB", not "wrong mount namespace".
Test plan (verified on e2e3 stand)
/run/udevmount +-m none→zpool create /dev/sdaexits 0, pool ONLINE.Summary by CodeRabbit
Bug Fixes
Configuration