We got a "Worker terminated" error today when a user started an HTTP request that depended on a workerpool worker, interrupted the request, then immediately retried.
Error: Worker terminated
at WorkerHandler.terminate (/home/app/deploy/node_modules/workerpool/src/WorkerHandler.js:516:45)
at WorkerHandler.terminateAndNotify (/home/app/deploy/node_modules/workerpool/src/WorkerHandler.js:617:8)
at ? (/home/app/deploy/node_modules/workerpool/src/WorkerHandler.js:456:26)
at ? (/home/app/deploy/node_modules/workerpool/src/Promise.js:179:17)
at ? (/home/app/deploy/node_modules/workerpool/src/Promise.js:109:7)
at Array.forEach (<anonymous>)
at _reject (/home/app/deploy/node_modules/workerpool/src/Promise.js:108:13)
at Object.reject (/app/app/deploy/node_modules/workerpool/src/Promise.js:164:5)
at Timeout._onTimeout (/home/app/deploy/node_modules/workerpool/src/WorkerHandler.js:484:36)
at listOnTimeout (node:internal/timers:605:17)
The second task was started within the first task's workerTerminateTimeout and was apparently killed by it.
The following analysis is from Claude Code. I've done my best to review it and take responsibility for any mistakes.
Summary
When a task is cancelled (via Promise.cancel()) or times out, WorkerHandler schedules a forced termination of the worker after workerTerminateTimeout (default 1000ms) to give the worker a chance to clean up gracefully. During that grace window, the pool can dispatch a new, unrelated task onto the same worker. When the timer fires (or the worker is otherwise force-terminated as part of the cancellation cleanup), every entry in processing is rejected with Error: Worker terminated — including the unrelated task that was assigned during the window.
The result: cancelling task A can cause task B, submitted milliseconds later, to fail with Worker terminated even though B has nothing to do with A's cancellation.
Reproduction
const workerpool = require('workerpool');
const pool = workerpool.pool({ maxWorkers: 1, workerTerminateTimeout: 100 });
const longTask = pool.exec(() => { while (true) {} }); // CPU-bound, can't cancel cooperatively
longTask.catch(() => {}); // ignore the cancellation rejection
setTimeout(() => {
longTask.cancel();
// Submit an unrelated task immediately. With maxWorkers: 1, the only worker
// is currently in cleanup-after-cancel. It is reported by `busy()` as idle,
// so the pool dispatches `add` onto it.
pool.exec((a, b) => a + b, [3, 4])
.then(r => console.log('result:', r)) // never logs
.catch(err => console.log('FAIL:', err.message)); // FAIL: Worker terminated
}, 50);
Output:
Root cause
A worker can be torn down by two paths, but WorkerHandler.busy() only knows about one of them:
-
WorkerHandler.terminate() sets this.cleaning = true (WorkerHandler.js:583). busy() correctly reports the worker as busy and Pool._getWorker() skips it.
-
Promise.cancel() / timeout in exec is caught in the resolver.promise.catch block at WorkerHandler.js:433 and adds the task to tracking with a scheduled terminateAndNotify(true) after workerTerminateTimeout. busy() does not consider tracking:
WorkerHandler.prototype.busy = function () {
return this.cleaning || Object.keys(this.processing).length > 0;
};
So Pool._getWorker() (Pool.js:273) sees the worker as available and dispatches the next queued task. When the cleanup timer fires, terminate(true) rejects every entry in processing with Error: Worker terminated — sweeping up the newly-assigned task as collateral damage.
Proposed fix
busy() should also report the worker as busy while it has entries in tracking:
WorkerHandler.prototype.busy = function () {
return this.cleaning
|| Object.keys(this.processing).length > 0
|| Object.keys(this.tracking).length > 0;
};
Once the worker responds to the cleanup message, the tracking entry is removed at WorkerHandler.js:324 and the worker becomes idle again — so workers that successfully run their abort listener stay in the pool, just as they do today. Workers whose cleanup times out (i.e. the workerTerminateTimeout setTimeout fires) get force-terminated and removed from the pool — also as today. The only behavior change is that newly arriving tasks are queued through the cleanup window instead of being dispatched onto a worker that's about to be killed.
We got a "Worker terminated" error today when a user started an HTTP request that depended on a workerpool worker, interrupted the request, then immediately retried.
The second task was started within the first task's
workerTerminateTimeoutand was apparently killed by it.The following analysis is from Claude Code. I've done my best to review it and take responsibility for any mistakes.
Summary
When a task is cancelled (via
Promise.cancel()) or times out,WorkerHandlerschedules a forced termination of the worker afterworkerTerminateTimeout(default 1000ms) to give the worker a chance to clean up gracefully. During that grace window, the pool can dispatch a new, unrelated task onto the same worker. When the timer fires (or the worker is otherwise force-terminated as part of the cancellation cleanup), every entry inprocessingis rejected withError: Worker terminated— including the unrelated task that was assigned during the window.The result: cancelling task A can cause task B, submitted milliseconds later, to fail with
Worker terminatedeven though B has nothing to do with A's cancellation.Reproduction
Output:
Root cause
A worker can be torn down by two paths, but
WorkerHandler.busy()only knows about one of them:WorkerHandler.terminate()setsthis.cleaning = true(WorkerHandler.js:583).busy()correctly reports the worker as busy andPool._getWorker()skips it.Promise.cancel()/ timeout inexecis caught in theresolver.promise.catchblock atWorkerHandler.js:433and adds the task totrackingwith a scheduledterminateAndNotify(true)afterworkerTerminateTimeout.busy()does not considertracking:So
Pool._getWorker()(Pool.js:273) sees the worker as available and dispatches the next queued task. When the cleanup timer fires,terminate(true)rejects every entry inprocessingwithError: Worker terminated— sweeping up the newly-assigned task as collateral damage.Proposed fix
busy()should also report the worker as busy while it has entries intracking:Once the worker responds to the cleanup message, the tracking entry is removed at
WorkerHandler.js:324and the worker becomes idle again — so workers that successfully run their abort listener stay in the pool, just as they do today. Workers whose cleanup times out (i.e. theworkerTerminateTimeoutsetTimeout fires) get force-terminated and removed from the pool — also as today. The only behavior change is that newly arriving tasks are queued through the cleanup window instead of being dispatched onto a worker that's about to be killed.