fix(safety): address production safety-audit findings#67
Merged
Conversation
Fixes from the safety audit, in roughly descending severity: - ublk-core/io.rs `build_user_data`: widen operands to u64 *before* shifting. `tgt_data << 24` overflowed u32 (truncating the high byte) whenever tgt_data had a bit in positions 8..15 — silently corrupting the packed io_uring userdata. Drop the now-needless `#[allow(arithmetic_overflow)]` on it and on `ublk_user_copy_pos` (whose shifts were already u64). - ublk-core/helpers.rs `IoBuf::new`: assert non-zero size *before* allocating (zero-size `alloc` is UB) and abort via `handle_alloc_error` on null instead of storing a null ptr that later derefs crash. Bound the `Send`/`Sync` impls on `T: Send`/`T: Sync` instead of blanket-impl, and document the invariant. - block/handler.rs: read `frozen`/`degraded`/`readonly` with `Acquire` (were `Relaxed`) to pair with the `AcqRel`/`Release` writes, so flag transitions are visible promptly on weakly-ordered (aarch64) targets. - block/block_map.rs `SequenceNumber`: use `AcqRel`/`Acquire` instead of `Relaxed` so a value advanced via `advance_to` at handoff is visible to the next `next()`, preserving WAL replay ordering on aarch64. - ublk-core/ctrl.rs: replace `exists()`-then-`remove_file()` TOCTOU with an unconditional `remove_file_if_present` helper that tolerates NotFound; log non-ENOENT errors from `stop_dev()` instead of swallowing. - ext4/writer.rs: surface non-UTF-8 symlink targets as an error rather than corrupting them via `from_utf8_lossy`; document the three `lookup(must_exist=true)` invariant unwraps with `.expect()`. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
- buffer_pool: drop unsound `unsafe impl Send` for WorkerBufferPool. It contains RefCell + raw ptr and is only ever held via Rc (already !Send) on a single-threaded executor, so the impl was both wrong and unneeded. - zc/io: add compile-time size guards + SAFETY comments for the ublksrv_io_cmd→[u8;16] and squeue::Entry→RawSqe reinterprets. The RawSqe guard caught that it is intentionally a *prefix* of the 64-byte SQE, so the invariant is `<=`, not equality. - handoff fdpass/predecessor/successor: document SAFETY invariants on the SCM_RIGHTS cmsg pointer arithmetic and raw socket fd ownership transfers. - write_cache/flush: replace blocking std::fs remove_file calls on the async flush path with tokio::fs to avoid stalling the executor on ext4 journal commits (lock-held sync renames left as-is). - handoff_cmd: propagate stream.flush() error instead of swallowing it. - server: log panics from server tasks during shutdown instead of discarding the JoinError. - ctrl: close the symlink-swap window on the device metadata dir by fchmod-ing an O_NOFOLLOW fd instead of a path-based set_permissions. - ctrl: make the queue-thread fan-in channel an explicitly bounded sync_channel(nr_queues). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The previous commit converted this remove_file to tokio::fs, but it sits in the same lexical scope as a just-dropped data_file write guard, tripping clippy::await_holding_lock (-D suspicious). This is the cold takeover-recovery path (runs once after a flush failure), not the hot/periodic flush path, so a synchronous unlink is fine and avoids the await-in-guarded-scope lint. The hot-path conversions to tokio::fs from the prior commit are retained. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The patched ublk_drv.ko is rejected with EKEYREJECTED ("Key was rejected by
service") on the 6.17.0-1015-azure runner kernel, so all ublk tests silently
skip. Add a non-fatal diagnostic step to the kernel-devices job to determine
whether enforcement comes from CONFIG_MODULE_SIG_FORCE (unfixable in
userspace) or Secure Boot lockdown / sig_enforce — dumping cmdline, secure
boot state, lockdown level, MODULE_SIG/LOCKDOWN config, a verbose modprobe,
and the relevant dmesg lines. To be reverted once the mechanism is confirmed.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Root cause of the silent ublk-test skips: the 6.17.0-1015-azure runner has module-signature enforcement OFF (sig_enforce=N, no CONFIG_MODULE_SIG_FORCE, lockdown=none, SecureBoot disabled) — yet modprobe rejected the patched module with EKEYREJECTED. The kernel's module_sig_check() tolerates a *missing* signature when enforcement is off, but a signature that is *present but invalid* hits the unconditional `default: return err` path. Binary-patching the module body left the stale appended signature attached, making it present-but- invalid → always rejected. Fix: strip the appended signature (sig + struct module_signature + magic) after patching, in all three ublk jobs (blktests, ublk-transports, kernel-devices). With enforcement off the kernel then loads the unsigned patched module (tainted, fine for CI). No re-signing, no reboot, full 6.17 coverage retained. Strip arithmetic validated against real signed .ko modules. Also harden the kernel-devices "Load kernel modules" step: make modprobe ublk_drv fatal and assert /dev/ublk-control exists, so a future load failure fails loudly instead of silently skipping every ublk test. Remove the temporary signing diagnostic step. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
- IoBuf<T>: Sync soundness — Cell<bool> raced via &self mlock/munlock; switch mlocked to AtomicBool so the unsafe Sync impl is actually sound - UblkCtrl: internal ctrl-ring macros return Result<_, CtrlRingUninit> instead of panicking; command paths propagate, public accessors keep their documented panic contract via expect() - buffer-registration mutex: recover from poison rather than unwrap-panic - circuit_breaker: drop genuinely-dead threshold(); justify retained dead_code (Windowed policy, reset, force_state); same for fault::inject and oci::ingest_tar Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
build_user_data: widen operands to u64 before shifting.tgt_data << 24overflowed u32 (truncating the high byte) whenever tgt_data had a bit in positions 8..15 — silently corrupting the packed io_uring userdata. Drop the now-needless#[allow(arithmetic_overflow)]on it and onublk_user_copy_pos(whose shifts were already u64).IoBuf::new: assert non-zero size before allocating (zero-sizeallocis UB) and abort viahandle_alloc_erroron null instead of storing a null ptr that later derefs crash. Bound theSend/Syncimpls onT: Send/T: Syncinstead of blanket-impl, and document the invariant.frozen/degraded/readonlywithAcquire(wereRelaxed) to pair with theAcqRel/Releasewrites, so flag transitions are visible promptly on weakly-ordered (aarch64) targets.SequenceNumber: useAcqRel/Acquireinstead ofRelaxedso a value advanced viaadvance_toat handoff is visible to the nextnext(), preserving WAL replay ordering on aarch64.exists()-then-remove_file()TOCTOU with an unconditionalremove_file_if_presenthelper that tolerates NotFound; log non-ENOENT errors fromstop_dev()instead of swallowing.from_utf8_lossy; document the threelookup(must_exist=true)invariant unwraps with.expect().