rutabaga_gfx/cross_domain: fix audio glitches by dropping write lock by aford173 · Pull Request #671 · containers/libkrun

aford173 · 2026-05-07T19:09:20Z

The previous write() held item_state.lock() across the write_volatile call that performs the actual syscall, serializing every cross-domain CMD_WRITE behind any other operation on the items table — including add_item from process_receive, which fires for every new fd received via SCM_RIGHTS as guests open additional channels.

For per-period eventfd wakeups (e.g. PipeWire audio streams signaling host playback every audio period), this means the write completes only after any in-flight item_state operation finishes. Under stream-create churn — many guest applications opening streams concurrently, each delivering new fds via SCM_RIGHTS that hit add_item under the same lock — the wait can exceed the audio period budget and produce missed-deadline glitches at the host's audio output.

This change shrinks the critical section to a brief fd dup() under the lock, performs the syscall lock-free, and only re-acquires the lock if hang_up indicates the item should be removed. In the common case (hang_up == 0, e.g. repeated eventfd wakeups for an active stream) the table is no longer touched per write, eliminating both the contention and the previous "remove + conditional re-insert" churn.

Behavioral changes vs the old code:

Common case (hang_up == 0): item stays in the table; we hand out a dup'd fd for the write. Net behavior identical, lock hold time bounded by dup().
hang_up == 1: item removed in a separate phase after the write, instead of "removed unconditionally then re-inserted on hang_up == 0". Same observed end state.
Concurrent writes to the same id (no longer serialized): the kernel guarantees atomicity for eventfd writes (8 bytes) and pipe writes <= PIPE_BUF, which are the only two CrossDomainItem variants this branch handles. Each caller dup's its own fd and writes through it independently.

Verified with a synthetic reproducer: a sustained 1 kHz sine playing through a guest PipeWire stream while ~200 short-lived guest streams open and close concurrently (each issuing SCM_RIGHTS for new eventfds, contending with the sustained stream's per-period writes for item_state). Capturing the host sink monitor and counting sample-to-sample deltas exceeding a clean-sine threshold, the stress workload produced ~10 distinct glitch bursts in an 8-second capture before this change, and zero across five consecutive runs after.

The previous write() held item_state.lock() across the write_volatile call that performs the actual syscall, serializing every cross-domain CMD_WRITE behind any other operation on the items table — including add_item from process_receive, which fires for every new fd received via SCM_RIGHTS as guests open additional channels. For per-period eventfd wakeups (e.g. PipeWire audio streams signaling host playback every audio period), this means the write completes only after any in-flight item_state operation finishes. Under stream-create churn — many guest applications opening streams concurrently, each delivering new fds via SCM_RIGHTS that hit add_item under the same lock — the wait can exceed the audio period budget and produce missed-deadline glitches at the host's audio output. This change shrinks the critical section to a brief fd dup() under the lock, performs the syscall lock-free, and only re-acquires the lock if hang_up indicates the item should be removed. In the common case (hang_up == 0, e.g. repeated eventfd wakeups for an active stream) the table is no longer touched per write, eliminating both the contention and the previous "remove + conditional re-insert" churn. Behavioral changes vs the old code: - Common case (hang_up == 0): item stays in the table; we hand out a dup'd fd for the write. Net behavior identical, lock hold time bounded by dup(). - hang_up == 1: item removed in a separate phase after the write, instead of "removed unconditionally then re-inserted on hang_up == 0". Same observed end state. - Concurrent writes to the same id (no longer serialized): the kernel guarantees atomicity for eventfd writes (8 bytes) and pipe writes <= PIPE_BUF, which are the only two CrossDomainItem variants this branch handles. Each caller dup's its own fd and writes through it independently. Verified with a synthetic reproducer: a sustained 1 kHz sine playing through a guest PipeWire stream while ~200 short-lived guest streams open and close concurrently (each issuing SCM_RIGHTS for new eventfds, contending with the sustained stream's per-period writes for item_state). Capturing the host sink monitor and counting sample-to-sample deltas exceeding a clean-sine threshold, the stress workload produced ~10 distinct glitch bursts in an 8-second capture before this change, and zero across five consecutive runs after. Signed-off-by: Adam Ford <adam.ford@anodize.com>

valpackett · 2026-05-08T02:35:45Z

Woah, that's quite the torture test scenario, did you ever actually experience stream-create churn in realistic regular desktop usage? 👀

I agree that current code is locking too much but I'm wary of the dup(), turning write() into dup()+write()+close() is quite some overhead. If only this were all async code, the forwarding of each fd would've just been an independent task… sigh.

Maybe we could use a refcount instead? Would only add a pointer chase for derefing the Rc stored in the table, better than two context-switches churning kernel state.

BTW, keep in mind that for libkrun 2.0 the plan is to switch to upstream rutabaga (#560) so changes should be proposed there as well (or ideally, first)

aford173 · 2026-05-08T14:31:10Z

@valpackett - I never saw it in real life, but I was playing with Claude. My end-game is run multiple instances of muvm on the same machine and have them share an audio pipeline, so I asked it to help craft a worst-case scenario.

aford173 requested review from MatiasVara, dorindabassey, jakecorrenti, mtjhrc, slp and tylerfanelli as code owners May 7, 2026 19:09

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

rutabaga_gfx/cross_domain: fix audio glitches by dropping write lock#671

rutabaga_gfx/cross_domain: fix audio glitches by dropping write lock#671
aford173 wants to merge 1 commit intocontainers:mainfrom
aford173:fix-audio-glitches

aford173 commented May 7, 2026

Uh oh!

valpackett commented May 8, 2026

Uh oh!

aford173 commented May 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

aford173 commented May 7, 2026

Uh oh!

valpackett commented May 8, 2026

Uh oh!

aford173 commented May 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants