worker: write fatal-error diagnostics syncly so stderr isn't dropped#4906
worker: write fatal-error diagnostics syncly so stderr isn't dropped#4906backspace wants to merge 3 commits into
Conversation
The worker's `fatalExit` handler already exists (uncaughtException / unhandledRejection backstop with a finalize-reservation race) — but it reports the error via `log.error(...)` immediately before `process.exit(1)`. `worker-manager.ts` spawns the child with `stdio: ['pipe', 'pipe', 'pipe', 'ipc']`, so the child's stderr is a libuv-async pipe; the final stream chunk gets discarded when the process disappears, and the captured server log shows the child as having silently exited `code=1, signal=null` with no clue why. worker.ts already uses `writeSync(2, ...)` for exactly this reason on the STARTUP / SIGINT / SIGTERM / disconnect stamps (see the comment above the STARTUP block at the top of the file). Apply the same pattern to the three fatal-exit paths: the uncaughtException / unhandledRejection handler, its inner finalize-failed fallback, and the outer startup-error `.catch`. Route each through a new helper that serializes the error with its full stack and walks `error.cause` (where Node fetch / undici / TLS errors stash the real reason). Discovered while debugging the action-demo on #4897 (CS-11180): every `_publish-realm` of a fresh source realm enqueues a copy-index job that throws *something* inside the worker; the worker exited silently; pg-queue retried, hit the 2-reservation cap, abandoned the job; the realm-server returned HTTP 500 `Job abandoned after 2 failed attempts (max=2)` to the publish endpoint caller. Without this fix the underlying job-processing error is unobservable. The bundled `serialize-fatal-reason` helper is in its own module because the FD-level write behavior can't be unit-tested in-process (it requires a real child_process.spawn + libuv-piped stderr to reproduce the bug being fixed) — but the serialization can. Tests cover: stack preservation, cause-chain walking, non-Error values, self-referential cause cycles (depth-capped), and Node fetch's typical `TypeError: fetch failed` + ECONNRESET-on-cause shape. Closes CS-11200. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 2ef1f358e6
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
Host Test Results 1 files ± 0 1 suites ±0 1h 34m 43s ⏱️ + 52m 17s Results for commit 6e89715. ± Comparison against earlier commit 82d4e66. Realm Server Test Results 1 files ± 0 1 suites ±0 11m 30s ⏱️ + 6m 1s Results for commit 6e89715. ± Comparison against earlier commit 82d4e66. |
Per code review on #4906 (P1): `String(Object.create(null))` throws `TypeError: Cannot convert object to primitive value` because the prototype-less object has neither `@@toPrimitive`, `toString`, nor `valueOf` for `OrdinaryToPrimitive` to call. Any object whose own `toString` throws hits the same path. `unhandledRejection` reasons are `any` and libraries do occasionally reject with such values. If `serializeFatalReason` had thrown inside `fatalExit` after `isFatalHandlerRunning = true`, the throw would have become an uncaughtException, re-entered `fatalExit`, hit the early-return guard, and returned silently — stranding the worker with its pg-queue reservation held and no FATAL stamp written. The fatal path cannot tolerate that. Two layers of defense: 1. `serialize-fatal-reason.ts`: wrap the entire body in try/catch and route every `String(...)` call through a `safeString` helper that catches and returns `[unstringifiable value]`. 2. `worker.ts`: wrap each `serializeFatalReason` call at its three call sites (uncaughtException/unhandledRejection handler, the finalize-failed fallback, and the outer startup-error catch) in a `safeSerialize` helper. The helper's own try/catch is the primary defense; the call-site wrap is belt-and-suspenders against a future regression in the helper. Tests added for the three triggering shapes: a prototype-less rejection value, a value whose own `toString` throws, and an Error whose `cause` has a hostile `toString`. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Prettier collapsed two ternary/call-expression forms onto a single line where the result fits the width budget. No semantic changes.
Another problem I encountered while working on #4897, explained by Claude:
For example, here: