Skip to content

lkl: per-thread irqs_enabled fixes IRQ-leak hangs under real drivers#638

Open
josephnef wants to merge 3 commits into
lkl:masterfrom
josephnef:irqs-enabled-per-thread
Open

lkl: per-thread irqs_enabled fixes IRQ-leak hangs under real drivers#638
josephnef wants to merge 3 commits into
lkl:masterfrom
josephnef:irqs-enabled-per-thread

Conversation

@josephnef
Copy link
Copy Markdown

@josephnef josephnef commented May 27, 2026

Summary

There are two related correctness bugs in arch/lkl/kernel/irq.c's
handling of IRQ-enable state. Both manifest as silent timer-IRQ
stalls (jiffies stops advancing, msleep-based kthreads freeze)
under workloads that the existing test suite doesn't exercise but
that real Linux drivers running in LKL hit immediately.

Bug 1: global flag, no per-thread save/restore. irqs_enabled is
a single static bool global; __switch_to doesn't save/restore it
across context switches. Any kernel path that does
spin_lock_irqsave and schedules before the matching _irqrestore
(the canonical pattern around wait_event_*) leaks the DISABLED
value to whatever thread runs next. lkl_trigger_irq sees the
leaked DISABLED and silently pends every IRQ including the timer.

Bug 2: host-thread callers honoring stale thread_info.
lkl_trigger_irq is documented as callable "from arbitrary host
threads." Host pthreads (libusb completion callbacks, glibc
SIGEV_THREAD timer callbacks) acquire the LKL CPU via
lkl_cpu_get but never go through __switch_to, so
_current_thread_info still points at whichever kernel task last
ran (often the idle task, with IRQs DISABLED). Checking that stale
field for host callers silently pends every host-injected IRQ.

Context where this was hit

A proof-of-concept that runs the mainline rtw88_8812au USB Wi-Fi
driver entirely in userspace via LKL. The host program links
liblkl.a, registers a virtual USB host controller into the
in-process kernel (a ~470-line HCD shim under
drivers/usb/host/-style that translates struct urb to a flat
view and back), and forwards URBs to libusb running on the host
process. The kernel side sees a normal USB Wi-Fi device;
rtw88_8812au probes verbatim, downloads firmware, brings wlan0
up; the host then calls lkl_if_up(wlan0) and opens an AF_PACKET
socket to capture 802.11 frames with radiotap headers. No kernel
module on the host, no CAP_NET_ADMIN — the kernel driver source is
reused as-is.

This is heavily threaded: libusb runs its own event thread that
posts URB completions back into the LKL kernel (Bug 2's path),
kernel-side polling kthreads drain those completions, the rtw88
driver uses the standard spin_lock_irqsave + wait_event_* /
msleep patterns throughout firmware download and chip init
(Bug 1's path). Both bugs reproduce reliably: probe hangs in ~50
USB control transfers on master; with this series applied, the
same probe runs ~8000 control transfers and wlan0 comes up
cleanly.

What the series does

Three commits, in order:

  1. lkl: add KUnit test for per-thread irqs_enabled isolation
    New arch/lkl/kernel/irq_test.c KUnit suite, gated on a new
    CONFIG_LKL_IRQ_KUNIT_TEST, wired into the existing kunit=yes
    build config in tools/lkl/Makefile.autoconf and into
    tools/lkl/tests/boot.c as kunit_irq (mirrors the existing
    kunit_pci / kunit_mmu). The test takes spin_lock_irqsave in
    one kthread, yields via schedule_timeout while the lock is held,
    and from a sibling kthread observes arch_local_save_flags.
    Fails on master; passes after commit 2.

  2. lkl: make irqs_enabled per-thread via current_thread_info()
    — Move irqs_enabled from a static bool global in irq.c into a
    field on struct thread_info; access via current_thread_info()
    from arch_local_save_flags / arch_local_irq_restore /
    lkl_trigger_irq. No explicit save/restore code is added —
    the existing _current_thread_info = task_thread_info(next) line
    in __switch_to is the entire mechanism. IRQ-enable state moves
    with its thread, the same way a real CPU's register file does.

  3. lkl: deliver IRQs from host-thread callers regardless of
    irqs_enabled
    — In lkl_trigger_irq, detect host-thread
    callers by lkl_ops->thread_equal(ti->tid, lkl_ops->thread_self())
    and deliver unconditionally; kernel-thread callers continue to
    honor their own per-thread irqs_enabled.

The split is for review tractability — commit 2 is a pure refactor
that changes a global into a per-thread field; commit 3 is the
semantic change for host callers (commented inline as the root cause
of the host-thread mode).

Test plan

  • CI kunit lane (tools/lkl/Makefile.autoconf defines
    kunit_test_enable — this series extends it to also set
    LKL_IRQ_KUNIT_TEST=y). On the patched branch, the
    lkl_test_kunit_irq entry in boot.c's tests[] should pass via
    the standard ok N lkl_irq line in the boot log.
  • CI linux / windows-2022 / clang-build / mmu_kasan lanes
    — no behavioural changes from this series with kunit=no. The
    current_thread_info()->irqs_enabled field is initialized in
    INIT_THREAD_INFO and init_ti; pre-existing tests should be
    unaffected.
  • Negative control: revert commit 2 only and rerun the kunit
    lane — the kunit_irq test should report not ok (proves the
    test exercises the bug, not just a tautology against the fix).
  • End-to-end driver bring-up under the libusb-backed HCD shim
    described in the Summary: rtw88-family driver (aircrack-ng's
    88XXau covers all three chips) validated locally on RTL8812AU
    (0bda:8812), RTL8821AU (2357:0120), and RTL8814AU (0bda:8813)
    wlan0 appears in 30-60 s, AF_PACKET capture returns frames.
    All hang on the un-patched kernel.

Why no automated test for commit 3

The KUnit suite in commit 1 reproduces the kernel-thread mode of the
bug. The host-thread mode (commit 3 fix) requires a real host pthread
to acquire the LKL CPU and invoke lkl_trigger_irq — easily exposed
by a libusb-style backend that posts URB completions from its event
thread, but not appropriate to wire into an in-tree test. If
maintainers want a KUnit-level test for commit 3, one could be added
that uses lkl_host_ops->thread_create to spawn a host pthread, has
it lkl_cpu_get + lkl_trigger_irq, and checks that an IRQ handler
runs synchronously. Happy to add this if requested.

Notes

The patch series is small (~70 lines net of code, plus the KUnit
test and its wiring). Commit messages on each patch are self-contained
and explain the rationale. Four save/restore variants were tried
before settling on the current_thread_info() shape used here — v1
(save outgoing + load incoming), v2 (load incoming only), v3 (force-
enable on switch), v4 (this one, no explicit save/restore — the
thread_info field handles it). Variants v1-v3 either regressed the
existing test suite or only partially fixed the bug. v4 is the
shape that is both minimal and complete.

Joseph added 3 commits May 27, 2026 15:34
arch/lkl/kernel/irq.c keeps `irqs_enabled` as a single `static bool`
global, with no save/restore in __switch_to. Any kernel path that does
spin_lock_irqsave and schedules before the matching restore leaks the
DISABLED value to whatever thread runs next. The concrete consequence
is that lkl_trigger_irq, observing the leaked DISABLED, silently pends
every IRQ — including the timer tick — and jiffies stops advancing.

Real drivers (e.g. rtw88_8812au's chip-init code path) trip this
within ~50 USB control transfers and hang the kernel forever; the
problem isn't theoretical.

Add a small KUnit suite, gated on a new CONFIG_LKL_IRQ_KUNIT_TEST,
that reproduces the kernel-side leg of the bug without needing a
real driver: take a spin_lock_irqsave in one kernel thread, yield via
schedule_timeout while holding the lock, and from a sibling kernel
thread that runs during the yield, observe its own arch_local_save_flags.
With the current global, the observer reads ARCH_IRQ_DISABLED — the
lock holder's leaked state. With a per-thread irqs_enabled (the
subsequent patch in this series), the observer reads its own
ARCH_IRQ_ENABLED.

This commit only adds the test. It fails on master; it passes after
the per-thread irqs_enabled patch that follows.

Wire-up:
 - arch/lkl/kernel/irq_test.c — the KUnit suite (.name = "lkl_irq")
 - arch/lkl/kernel/Makefile — build it when CONFIG_LKL_IRQ_KUNIT_TEST=y
 - arch/lkl/Kconfig — new boolean depends on KUNIT
 - tools/lkl/Makefile.autoconf — kunit=yes enables it alongside the
   existing LKL_PCI_KUNIT_TEST
 - tools/lkl/tests/boot.c — lkl_test_kunit_irq parses the boot log
   for "ok N lkl_irq", mirroring lkl_test_kunit_pci

Signed-off-by: Joseph <joseph@josephnef.dev>
irqs_enabled lived in arch/lkl/kernel/irq.c as a single `static bool`
global, with no save/restore in __switch_to. Any kernel path that did
spin_lock_irqsave and scheduled before the matching restore (the
canonical pattern around wait_event_*) leaked the DISABLED value to
whichever thread ran next. lkl_trigger_irq saw the leaked DISABLED and
silently pended every IRQ — including the timer tick. jiffies stopped
advancing; every msleep-based kthread hung.

The LKL irqs_enabled-KUnit test added in the previous commit
reproduces the failure on this commit's parent: an observer kthread,
scheduled in while a sibling holds spin_lock_irqsave, reads the
spilled DISABLED state.

This commit moves irqs_enabled into struct thread_info and accesses
it via current_thread_info(). The fix doesn't add any explicit
per-thread save/restore code: the existing
  _current_thread_info = task_thread_info(next);
in __switch_to (arch/lkl/kernel/threads.c) is the entire mechanism.
Each thread's irqs_enabled travels with its thread_info, the same way
a real CPU's register file follows the thread.

 - arch/lkl/include/asm/thread_info.h: add `unsigned long irqs_enabled`
   field; INIT_THREAD_INFO sets it to 1 (ARCH_IRQ_ENABLED) so the init
   task starts with IRQs enabled.
 - arch/lkl/kernel/irq.c: drop the `static bool irqs_enabled` global.
   arch_local_save_flags, arch_local_irq_restore, and the
   lkl_trigger_irq pending-check all go through current_thread_info().
 - arch/lkl/kernel/threads.c: init_ti sets ti->irqs_enabled =
   ARCH_IRQ_ENABLED for freshly-allocated kernel threads.

With this commit applied on top of the previous one, the
LKL_IRQ_KUNIT_TEST suite passes (ok 1 lkl_irq).

Tested against:
 - The lkl-wifi-poc project (https://github.com/josephnef/lkl-wifi-poc),
   where the rtw88_8812au USB Wi-Fi driver running under LKL hung
   within ~50 control transfers on the global-flag kernel; under this
   patch it completes ~8000 transfers and brings wlan0 up cleanly.
 - The new KUnit suite (CONFIG_LKL_IRQ_KUNIT_TEST=y), which is what
   the lkl_irq test in boot.c covers.

Signed-off-by: Joseph <joseph@josephnef.dev>
lkl_trigger_irq is documented as callable "from arbitrary host threads"
(the comment block above the function). True host pthreads — libusb
completion threads, glibc SIGEV_THREAD timer callbacks, anything
created by a backend library that ends up posting an IRQ into the
LKL kernel — acquire the LKL CPU via lkl_cpu_get but never go through
__switch_to. _current_thread_info therefore still points at whichever
kernel task last ran (often the idle task), and that task's
irqs_enabled field may be ARCH_IRQ_DISABLED.

Honoring that stale flag for host-thread callers was a silent IRQ-
pending hang: the IRQ got marked pending, but nothing on the kernel
side would notice until the matching irqrestore in the original kernel
context — which often never came, because the kernel had already
moved on. Drivers that post IRQs from libusb's event thread (rtw88's
USB completion path is the original report; any virtio backend with
a thread-based notification scheme has the same exposure) hung the
kernel within tens of operations.

Detect host-thread callers by comparing thread_self() to the
thread_info owner's tid via lkl_ops->thread_equal. When they differ,
the caller is not the kernel thread that owns this thread_info; the
stale irqs_enabled field has no claim on us, and we should deliver
the IRQ. Kernel-thread callers — including the recursive
"lkl_trigger_irq -> IRQ -> softirq -> lkl_trigger_irq" path called
out in the original comment — continue to honor their own per-thread
irqs_enabled (set by the previous patch).

Tested against the lkl-wifi-poc project: rtw88_8812au's libusb-fed
URB completion path ran ~50 transfers before this fix and ~8000
after, with no observed regression in the existing test suite.

Signed-off-by: Joseph <joseph@josephnef.dev>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant