Skip to content

fix(safety): address production safety-audit findings#67

Merged
jaredLunde merged 6 commits into
mainfrom
fix/safety-audit-findings
Jun 2, 2026
Merged

fix(safety): address production safety-audit findings#67
jaredLunde merged 6 commits into
mainfrom
fix/safety-audit-findings

Conversation

@jaredLunde
Copy link
Copy Markdown
Contributor

  • ublk-core/io.rs build_user_data: widen operands to u64 before shifting. tgt_data << 24 overflowed u32 (truncating the high byte) whenever tgt_data had a bit in positions 8..15 — silently corrupting the packed io_uring userdata. Drop the now-needless #[allow(arithmetic_overflow)] on it and on ublk_user_copy_pos (whose shifts were already u64).
  • ublk-core/helpers.rs IoBuf::new: assert non-zero size before allocating (zero-size alloc is UB) and abort via handle_alloc_error on null instead of storing a null ptr that later derefs crash. Bound the Send/Sync impls on T: Send/T: Sync instead of blanket-impl, and document the invariant.
  • block/handler.rs: read frozen/degraded/readonly with Acquire (were Relaxed) to pair with the AcqRel/Release writes, so flag transitions are visible promptly on weakly-ordered (aarch64) targets.
  • block/block_map.rs SequenceNumber: use AcqRel/Acquire instead of Relaxed so a value advanced via advance_to at handoff is visible to the next next(), preserving WAL replay ordering on aarch64.
  • ublk-core/ctrl.rs: replace exists()-then-remove_file() TOCTOU with an unconditional remove_file_if_present helper that tolerates NotFound; log non-ENOENT errors from stop_dev() instead of swallowing.
  • ext4/writer.rs: surface non-UTF-8 symlink targets as an error rather than corrupting them via from_utf8_lossy; document the three lookup(must_exist=true) invariant unwraps with .expect().

jaredLunde and others added 6 commits June 2, 2026 13:57
Fixes from the safety audit, in roughly descending severity:

- ublk-core/io.rs `build_user_data`: widen operands to u64 *before*
  shifting. `tgt_data << 24` overflowed u32 (truncating the high byte)
  whenever tgt_data had a bit in positions 8..15 — silently corrupting
  the packed io_uring userdata. Drop the now-needless
  `#[allow(arithmetic_overflow)]` on it and on `ublk_user_copy_pos`
  (whose shifts were already u64).
- ublk-core/helpers.rs `IoBuf::new`: assert non-zero size *before*
  allocating (zero-size `alloc` is UB) and abort via
  `handle_alloc_error` on null instead of storing a null ptr that later
  derefs crash. Bound the `Send`/`Sync` impls on `T: Send`/`T: Sync`
  instead of blanket-impl, and document the invariant.
- block/handler.rs: read `frozen`/`degraded`/`readonly` with `Acquire`
  (were `Relaxed`) to pair with the `AcqRel`/`Release` writes, so flag
  transitions are visible promptly on weakly-ordered (aarch64) targets.
- block/block_map.rs `SequenceNumber`: use `AcqRel`/`Acquire` instead of
  `Relaxed` so a value advanced via `advance_to` at handoff is visible to
  the next `next()`, preserving WAL replay ordering on aarch64.
- ublk-core/ctrl.rs: replace `exists()`-then-`remove_file()` TOCTOU with
  an unconditional `remove_file_if_present` helper that tolerates
  NotFound; log non-ENOENT errors from `stop_dev()` instead of swallowing.
- ext4/writer.rs: surface non-UTF-8 symlink targets as an error rather
  than corrupting them via `from_utf8_lossy`; document the three
  `lookup(must_exist=true)` invariant unwraps with `.expect()`.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
- buffer_pool: drop unsound `unsafe impl Send` for WorkerBufferPool. It
  contains RefCell + raw ptr and is only ever held via Rc (already !Send)
  on a single-threaded executor, so the impl was both wrong and unneeded.
- zc/io: add compile-time size guards + SAFETY comments for the
  ublksrv_io_cmd→[u8;16] and squeue::Entry→RawSqe reinterprets. The
  RawSqe guard caught that it is intentionally a *prefix* of the 64-byte
  SQE, so the invariant is `<=`, not equality.
- handoff fdpass/predecessor/successor: document SAFETY invariants on the
  SCM_RIGHTS cmsg pointer arithmetic and raw socket fd ownership transfers.
- write_cache/flush: replace blocking std::fs remove_file calls on the
  async flush path with tokio::fs to avoid stalling the executor on ext4
  journal commits (lock-held sync renames left as-is).
- handoff_cmd: propagate stream.flush() error instead of swallowing it.
- server: log panics from server tasks during shutdown instead of
  discarding the JoinError.
- ctrl: close the symlink-swap window on the device metadata dir by
  fchmod-ing an O_NOFOLLOW fd instead of a path-based set_permissions.
- ctrl: make the queue-thread fan-in channel an explicitly bounded
  sync_channel(nr_queues).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The previous commit converted this remove_file to tokio::fs, but it sits in
the same lexical scope as a just-dropped data_file write guard, tripping
clippy::await_holding_lock (-D suspicious). This is the cold takeover-recovery
path (runs once after a flush failure), not the hot/periodic flush path, so a
synchronous unlink is fine and avoids the await-in-guarded-scope lint. The
hot-path conversions to tokio::fs from the prior commit are retained.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The patched ublk_drv.ko is rejected with EKEYREJECTED ("Key was rejected by
service") on the 6.17.0-1015-azure runner kernel, so all ublk tests silently
skip. Add a non-fatal diagnostic step to the kernel-devices job to determine
whether enforcement comes from CONFIG_MODULE_SIG_FORCE (unfixable in
userspace) or Secure Boot lockdown / sig_enforce — dumping cmdline, secure
boot state, lockdown level, MODULE_SIG/LOCKDOWN config, a verbose modprobe,
and the relevant dmesg lines. To be reverted once the mechanism is confirmed.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Root cause of the silent ublk-test skips: the 6.17.0-1015-azure runner has
module-signature enforcement OFF (sig_enforce=N, no CONFIG_MODULE_SIG_FORCE,
lockdown=none, SecureBoot disabled) — yet modprobe rejected the patched module
with EKEYREJECTED. The kernel's module_sig_check() tolerates a *missing*
signature when enforcement is off, but a signature that is *present but
invalid* hits the unconditional `default: return err` path. Binary-patching the
module body left the stale appended signature attached, making it present-but-
invalid → always rejected.

Fix: strip the appended signature (sig + struct module_signature + magic) after
patching, in all three ublk jobs (blktests, ublk-transports, kernel-devices).
With enforcement off the kernel then loads the unsigned patched module
(tainted, fine for CI). No re-signing, no reboot, full 6.17 coverage retained.
Strip arithmetic validated against real signed .ko modules.

Also harden the kernel-devices "Load kernel modules" step: make modprobe
ublk_drv fatal and assert /dev/ublk-control exists, so a future load failure
fails loudly instead of silently skipping every ublk test. Remove the
temporary signing diagnostic step.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
- IoBuf<T>: Sync soundness — Cell<bool> raced via &self mlock/munlock;
  switch mlocked to AtomicBool so the unsafe Sync impl is actually sound
- UblkCtrl: internal ctrl-ring macros return Result<_, CtrlRingUninit>
  instead of panicking; command paths propagate, public accessors keep
  their documented panic contract via expect()
- buffer-registration mutex: recover from poison rather than unwrap-panic
- circuit_breaker: drop genuinely-dead threshold(); justify retained
  dead_code (Windowed policy, reset, force_state); same for fault::inject
  and oci::ingest_tar

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@jaredLunde jaredLunde merged commit 4f96181 into main Jun 2, 2026
24 checks passed
@jaredLunde jaredLunde deleted the fix/safety-audit-findings branch June 2, 2026 23:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant