rutabaga_gfx/cross_domain: fix audio glitches by dropping write lock#671
rutabaga_gfx/cross_domain: fix audio glitches by dropping write lock#671aford173 wants to merge 1 commit intocontainers:mainfrom
Conversation
The previous write() held item_state.lock() across the write_volatile call that performs the actual syscall, serializing every cross-domain CMD_WRITE behind any other operation on the items table — including add_item from process_receive, which fires for every new fd received via SCM_RIGHTS as guests open additional channels. For per-period eventfd wakeups (e.g. PipeWire audio streams signaling host playback every audio period), this means the write completes only after any in-flight item_state operation finishes. Under stream-create churn — many guest applications opening streams concurrently, each delivering new fds via SCM_RIGHTS that hit add_item under the same lock — the wait can exceed the audio period budget and produce missed-deadline glitches at the host's audio output. This change shrinks the critical section to a brief fd dup() under the lock, performs the syscall lock-free, and only re-acquires the lock if hang_up indicates the item should be removed. In the common case (hang_up == 0, e.g. repeated eventfd wakeups for an active stream) the table is no longer touched per write, eliminating both the contention and the previous "remove + conditional re-insert" churn. Behavioral changes vs the old code: - Common case (hang_up == 0): item stays in the table; we hand out a dup'd fd for the write. Net behavior identical, lock hold time bounded by dup(). - hang_up == 1: item removed in a separate phase after the write, instead of "removed unconditionally then re-inserted on hang_up == 0". Same observed end state. - Concurrent writes to the same id (no longer serialized): the kernel guarantees atomicity for eventfd writes (8 bytes) and pipe writes <= PIPE_BUF, which are the only two CrossDomainItem variants this branch handles. Each caller dup's its own fd and writes through it independently. Verified with a synthetic reproducer: a sustained 1 kHz sine playing through a guest PipeWire stream while ~200 short-lived guest streams open and close concurrently (each issuing SCM_RIGHTS for new eventfds, contending with the sustained stream's per-period writes for item_state). Capturing the host sink monitor and counting sample-to-sample deltas exceeding a clean-sine threshold, the stress workload produced ~10 distinct glitch bursts in an 8-second capture before this change, and zero across five consecutive runs after. Signed-off-by: Adam Ford <adam.ford@anodize.com>
|
Woah, that's quite the torture test scenario, did you ever actually experience stream-create churn in realistic regular desktop usage? 👀 I agree that current code is locking too much but I'm wary of the Maybe we could use a refcount instead? Would only add a pointer chase for derefing the BTW, keep in mind that for libkrun 2.0 the plan is to switch to upstream rutabaga (#560) so changes should be proposed there as well (or ideally, first) |
|
@valpackett - I never saw it in real life, but I was playing with Claude. My end-game is run multiple instances of muvm on the same machine and have them share an audio pipeline, so I asked it to help craft a worst-case scenario. |
The previous write() held item_state.lock() across the write_volatile call that performs the actual syscall, serializing every cross-domain CMD_WRITE behind any other operation on the items table — including add_item from process_receive, which fires for every new fd received via SCM_RIGHTS as guests open additional channels.
For per-period eventfd wakeups (e.g. PipeWire audio streams signaling host playback every audio period), this means the write completes only after any in-flight item_state operation finishes. Under stream-create churn — many guest applications opening streams concurrently, each delivering new fds via SCM_RIGHTS that hit add_item under the same lock — the wait can exceed the audio period budget and produce missed-deadline glitches at the host's audio output.
This change shrinks the critical section to a brief fd dup() under the lock, performs the syscall lock-free, and only re-acquires the lock if hang_up indicates the item should be removed. In the common case (hang_up == 0, e.g. repeated eventfd wakeups for an active stream) the table is no longer touched per write, eliminating both the contention and the previous "remove + conditional re-insert" churn.
Behavioral changes vs the old code:
Verified with a synthetic reproducer: a sustained 1 kHz sine playing through a guest PipeWire stream while ~200 short-lived guest streams open and close concurrently (each issuing SCM_RIGHTS for new eventfds, contending with the sustained stream's per-period writes for item_state). Capturing the host sink monitor and counting sample-to-sample deltas exceeding a clean-sine threshold, the stress workload produced ~10 distinct glitch bursts in an 8-second capture before this change, and zero across five consecutive runs after.