Skip to content

#118: fix NTS test failures (getaddrinfo leak + macOS curl progress exception)#120

Merged
EdmondDantes merged 3 commits into
mainfrom
118-nts-surfaced-test-failures
May 15, 2026
Merged

#118: fix NTS test failures (getaddrinfo leak + macOS curl progress exception)#120
EdmondDantes merged 3 commits into
mainfrom
118-nts-surfaced-test-failures

Conversation

@EdmondDantes
Copy link
Copy Markdown
Contributor

Summary

Closes the remaining NTS-only test failures tracked in #118.

  • getaddrinfo struct leak on reactor shutdown (NTS). The
    `async_dns_addrinfo_t` (288 B) was never freed when a coroutine
    cancelled a DNS resolve and the reactor shut down before libuv's
    threadpool worker finished the blocking `getaddrinfo()` syscall.
    Per libuv docs `uv_cancel(uv_getaddrinfo_t*)` returns `EBUSY` for
    in-flight requests — we cannot preempt the worker, only wait for it.
    Fix: two-phase drain in `libuv_reactor_shutdown` — bounded
    `UV_RUN_NOWAIT` for ready callbacks, then bounded `UV_RUN_ONCE` for
    threadpool cancel-completions. Bounded fallback leaks struct rather
    than UAF if budget is exhausted (OS reclaims at process exit).
    (Commit `52604b6`.)

  • curl `XFERINFOFUNCTION` / `PROGRESSFUNCTION` exception leak (macOS).
    Two of four async-aware curl callbacks in php-src
    `ext/curl/interface.c` were missing the exception-handoff pattern.
    When the user callback threw, `EG(exception)` stayed set across
    libcurl iterations; `zend_call_known_fcc` on subsequent ticks
    short-circuited without clearing it. Eventually the transfer ended
    and the dangling exception surfaced outside the coroutine — on Linux
    it happened to land on a frame the awaiter unwound, on macOS the
    libuv/kqueue reentry path delivered it as uncaught at top-level.
    Fix lives in php-src `true-async` (`ae2bd4fba38`): both callbacks
    now hand the exception off to `curl_async_event_set_callback_exception()`,
    clear it, and return 1 to abort the transfer.

  • CI infra. Added `libiconv` to the macOS brew install line in
    `debug-bugfix.yml` so `./configure` finds it via
    `--with-iconv=$(brew --prefix)/opt/libiconv`.

Verification

`Debug Bugfix (manual)` workflow run `25917156027` — all three jobs green:

  • `MACOS_NTS` ✓ (3m26s) — `tests/curl/035-progress_exception.phpt` + `056`
  • `LINUX_NTS_ASAN` ✓ (3m44s) — `cancel_during_io` (getaddrinfo leak regression)
  • `LINUX_ZTS_JIT` ✓ (4m4s) — `thread_pool/cancel` JIT SEGV regression

Companion change

This PR depends on php-src commit `ae2bd4fba38` on the `true-async`
branch — the actual curl-callback fix lives there. Mergeable
independently because the ext/async CHANGELOG entry just documents
that companion change.

Test plan

  • MACOS_NTS green on `Debug Bugfix`
  • LINUX_NTS_ASAN green
  • LINUX_ZTS_JIT green
  • Full `build` matrix on `main` after merge (post-merge verification)

The first half of #118 (NTS-only): async_dns_addrinfo_t (288 B) leaked
when a coroutine cancelled a DNS lookup and the reactor shut down before
the libuv threadpool worker finished its blocking getaddrinfo() syscall.

The dispose path sets LIBUV_DNS_F_DISPOSE_PENDING and relies on
on_addrinfo_event firing (re-entering dispose) to reach the pefree
branch. Our shutdown drain used UV_RUN_NOWAIT only — non-blocking peeks
that don't wait for the threadpool cancel-completion to surface via
libuv's internal pipe. After uv_loop_close() the callback can't fire,
the struct leaks.

Per libuv docs (and issue libuv/libuv#2481), uv_cancel() on an in-flight
uv_getaddrinfo_t returns EBUSY: workers cannot be preempted, only
waited on.

Two-phase drain in libuv_reactor_shutdown:

  Phase 1 — bounded UV_RUN_NOWAIT (sync callbacks already on loop).
  Phase 2 — bounded UV_RUN_ONCE  (block until threadpool cancel-
            completions arrive; cap so a wedged DNS server can't hang
            shutdown forever).

If a worker is still wedged past the budget, leave the loop open:
pefree-ing pending structs would race the still-running worker (UAF,
much worse than a leak). The OS reclaims memory at process exit. A
ZEND_DEBUG-only warning is printed for visibility.

Local impact: ~15 ms shutdown for a normal cancel test (Phase 1 closes
the loop immediately; Phase 2 not entered). No functional regressions
in ext/async/tests.
…S fix)

Companion entry for the php-src fix in ext/curl/interface.c. Documents
why the bug only surfaced on macOS (libuv/kqueue reentry path) and that
the same pattern was already applied to curl_prereqfunction and
curl_debug — these two callbacks were the gap.
MACOS_NTS job was failing at the Configure PHP step with
'Please specify the install prefix of iconv with --with-iconv=<DIR>'
because the workflow passes --with-iconv=$(brew --prefix)/opt/libiconv
but libiconv was not in the brew install list.

LINUX_NTS_ASAN and LINUX_ZTS_JIT are green; this unblocks verification
of the curl xferinfo/progress macOS fix.
@EdmondDantes EdmondDantes linked an issue May 15, 2026 that may be closed by this pull request
@EdmondDantes EdmondDantes merged commit 228740f into main May 15, 2026
9 checks passed
@EdmondDantes EdmondDantes deleted the 118-nts-surfaced-test-failures branch May 15, 2026 12:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

NTS-surfaced test failures

1 participant