Skip to content

fix(system_tray): dispatch tray updates asynchronously to avoid blocking callers (#4199)#5045

Closed
moimart wants to merge 1 commit intoLizardByte:masterfrom
moimart:fix/4199-tray-deadlock
Closed

fix(system_tray): dispatch tray updates asynchronously to avoid blocking callers (#4199)#5045
moimart wants to merge 1 commit intoLizardByte:masterfrom
moimart:fix/4199-tray-deadlock

Conversation

@moimart
Copy link
Copy Markdown

@moimart moimart commented Apr 23, 2026

Summary

update_tray_*() functions in src/system_tray.cpp call tray_update(), which on Linux blocks the caller inside pthread_cond_wait until the tray library's GTK main-loop callback runs. That callback calls libayatana-appindicator and libnotify synchronously. If the active desktop notification daemon is unresponsive — a common failure mode on Wayland compositors (swaync/dunst/Quickshell) during desktop transitions, lock screens, or daemon restarts — the callback blocks indefinitely and every caller of update_tray_* blocks with it.

One of those callers is stream::session::join(), which arms a 10-second NVENC-deadlock watchdog. When the tray update doesn't return in time, the watchdog fires debug_trap() and the entire sunshine process dies with SIGTRAP. Reported as #4199.

Before: `session::join() → update_tray_pausing() → tray_update() → pthread_cond_wait → tray_update_internal → notify_notification_show → GDBus blocked on a hung notification daemon → 10 s → debug_trap() → process dies.**

After: session::join() → update_tray_pausing() → spawn detached worker → return. session teardown completes. The worker stays blocked until the daemon recovers; if the process exits first, the thread is cleaned up by the OS.

Change

  • New helper run_tray_async(fn) that spawns a detached std::thread, takes tray_async_mutex, and runs fn. Catches exceptions so a runaway tray/notify library call can't kill the worker. No-ops cleanly if tray_initialized is false at post time or at worker start time (race window).
  • tray_async_mutex serializes mutation of the shared tray struct across workers.
  • Persistent file-scope std::string buffers (playing_msg_buf, pausing_msg_buf, stopped_msg_buf) replace the static std::string msg = std::format(...) pattern. The old pattern only evaluated the format expression the first time each function ran, permanently capturing the first app_name; this also fixes that latent bug.
  • Each update_tray_* body moves verbatim into a lambda passed to run_tray_async.

What stays the same

  • Public API (src/system_tray.h). No call-site changes in src/stream.cpp, src/main.cpp, src/nvhttp.cpp, src/process.cpp.
  • Tray event loop (tray_thread_worker + process_tray_events using tray_loop(1)) is untouched — fix(tray): use the blocking event loop to avoid wasting power #4457's power optimization is preserved.
  • The session::join() NVENC-deadlock watchdog is untouched — it exists for a separate, legitimate NVIDIA-driver issue, and remains the intended failsafe. The fix just prevents a non-critical tray subsystem from tripping it.
  • Windows and macOS tray paths work unchanged. tray_update() on those platforms isn't known to block, but routing through the async worker costs essentially nothing and eliminates any possibility of blocking the caller.

Trade-offs / caveats

  • Each update spawns a transient thread (~one per streaming state change — a few per session). If the notification daemon is hung and never recovers, the workers stay blocked on tray_async_mutex + tray_update's internal cond var. Memory impact is bounded by stream state transitions per session.
  • If the notification daemon is permanently wedged, tray state stops updating but sunshine otherwise runs normally. Before this patch, sunshine would SIGTRAP instead.
  • std::thread construction can throw on resource exhaustion; this is caught and logged (warning) and the update is dropped. Dropping a tray icon update is preferable to surfacing the error on session teardown.

Test plan

  • Reproduced SIGTRAP crash in stream::session::join due to deadlock with system tray/notification thread on Wayland #4199 locally on CachyOS / Hyprland (Wayland, Quickshell-based notification daemon) with the upstream v2025.924.154138 tag: cluster of SIGTRAP crashes on Moonlight disconnect, coredumpctl showed raisepthread_once in the tray handler path.
  • Built sunshine with this patch applied (v2025.924 tag port; this PR is forward-ported to master); ran 10+ connect/disconnect cycles from Moonlight — no SIGTRAP, session teardown completes cleanly each time.
  • Verified tray icon and notifications still work when the notification daemon is healthy.
  • CI: I have only tested on Linux. Windows and macOS reviewers: please confirm tray behavior is unchanged (the only user-visible change is that updates run on a short-lived worker thread).

Fixes #4199

@moimart moimart force-pushed the fix/4199-tray-deadlock branch 3 times, most recently from de87207 to 33c85ec Compare April 23, 2026 12:12
…ing callers (LizardByte#4199)

update_tray_*() functions call tray_update(), which on Linux blocks the
caller inside pthread_cond_wait until the tray library's GTK main-loop
callback runs. That callback calls libayatana-appindicator and libnotify
synchronously. If the desktop notification daemon is unresponsive — a
common failure mode on Wayland compositors (swaync/dunst/Quickshell)
during desktop transitions, lock screens, or daemon restarts — the
callback blocks indefinitely and every caller of update_tray_* blocks
with it.

One of those callers is stream::session::join(), which arms a 10-second
NVENC-deadlock watchdog. When the tray update doesn't return in time,
the watchdog fires debug_trap() and the entire sunshine process dies
with SIGTRAP. Reported as LizardByte#4199.

Fix: spawn a detached worker thread in each update_tray_* entry point.
The worker serializes on tray_async_mutex and runs the original update
body. The caller returns immediately. If the notification daemon is
hung, the worker stays blocked until the daemon recovers or the process
exits, but session teardown — and all other callers of update_tray_* —
complete promptly.

Also replaces the 'static std::string msg = std::format(...)' pattern
with persistent string buffers living in a Meyers-singleton state
object; the prior static-local-initialization idiom captured only the
first app_name on first call and then reused it forever, a latent bug
unrelated to the SIGTRAP but worth fixing while we're here.

Fixes LizardByte#4199
@moimart moimart force-pushed the fix/4199-tray-deadlock branch from 33c85ec to 57af5ff Compare April 23, 2026 12:17
@sonarqubecloud
Copy link
Copy Markdown

❌ The last analysis has failed.

See analysis details on SonarQube Cloud

@ReenigneArcher ReenigneArcher added the ai PR has signs of heavy ai usage (either indicated by user or assumed) label Apr 23, 2026
@ReenigneArcher
Copy link
Copy Markdown
Member

Did you test the current branch to see if it's fixed? Because there were significant changes to the tray library. Other's reported it's now working for them.

Second, tray is being migrated in #4907

@moimart
Copy link
Copy Markdown
Author

moimart commented Apr 23, 2026

Second, tray is being migrated in #4907

Ohh. Didn't realize about that one.. then indeed ignore this. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ai PR has signs of heavy ai usage (either indicated by user or assumed)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

SIGTRAP crash in stream::session::join due to deadlock with system tray/notification thread on Wayland

2 participants