Skip to content

async: always defer task wakes via ngx_post_event#295

Open
CVanF5 wants to merge 2 commits into
nginx:mainfrom
CVanF5:fix/schedule-defer-wakes
Open

async: always defer task wakes via ngx_post_event#295
CVanF5 wants to merge 2 commits into
nginx:mainfrom
CVanF5:fix/schedule-defer-wakes

Conversation

@CVanF5
Copy link
Copy Markdown

@CVanF5 CVanF5 commented May 29, 2026

Fixes #294.

schedule() ran runnable.run() synchronously when a task was woken from
outside its own poll, violating the Waker contract (wakes must be non-blocking
and non-re-entrant). When a wake fires from a Drop holding a lock the task also
needs — e.g. h2's Streams::drop — the re-poll re-enters and deadlocks. This
always defers the wake via ngx_post_event (one event-loop tick).

  • fix: drop the synchronous runnable.run(); always SCHEDULER.schedule().
  • test: freestanding reproducer (only async_task) — synchronous re-poll
    reproduces the held-lock deadlock signature, deferred re-poll avoids it; no
    NGINX event loop needed.

Verified on Linux + macOS aarch64: cargo test / clippy --all-targets -Dwarnings
/ fmt --check all clean.

CVanF5 and others added 2 commits May 29, 2026 10:34
`schedule()` ran `runnable.run()` synchronously when a task was woken
from outside its own poll (`woken_while_running == false`). That violates
the `Waker::wake()` contract (wakes must be non-blocking and
non-re-entrant): when a wake fires from a `Drop` that holds a lock the
woken task also needs — e.g. h2's `Streams::drop` waking its `Connection`
task while holding `Arc<Mutex<Inner>>` — the synchronous re-poll re-enters
and deadlocks on that lock.

Always defer the wake via `ngx_post_event` instead; the runnable is
re-polled on the next event-loop tick by `ngx_event_process_posted`. On
the single-threaded event loop that is one worker-local list insert — one
tick of latency.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Add a freestanding test in `async_::spawn` (no deps beyond `async_task`)
reproducing the deadlock fixed by the previous commit: a `Drop` impl
wakes a parked task while holding a lock. With synchronous re-poll the
re-poll finds the lock still held — the deadlock signature, surfaced via
`Mutex::try_lock` returning `WouldBlock` so the test cannot hang; with
deferred wakes the re-poll acquires the lock cleanly. The test supplies
its own `schedule` functions, so no NGINX event loop is required.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@avahahn
Copy link
Copy Markdown

avahahn commented May 29, 2026

I was asked to review this to identify if it could cause any headaches within NGINX WASM.
Without commenting on anything else I can say that all tests pass in NGINX WASM with this fix and that I dont think this design will cause any issues for the WASM module.

I appreciate that this change will also make the interleaving of WASM guest execution (or any rust async code) with other components in the NGINX worker process easier to reason about.

@pschyska
Copy link
Copy Markdown

What are the chances 😲?
I had the same issue, but in ngx-tickle, and with hyper, not h2. We used to inherit this inline-run design as well, and came to the same conclusion (that we must always enqueue). I wanted to report it here because I suspected it would apply, and was trying to make a reproducer, but now that you have reported it independently I guess it's clear.

I can't do an official review here but: lgtm, and thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

schedule() re-polls synchronously on wake, violating the Waker contract

3 participants