Skip to content

Worker process memory leak: gradual heap growth leads to OOM crash after 3–12 hours #365

@dahlia

Description

@dahlia

Summary

The worker process (NODE_TYPE=worker) experiences a gradual memory leak that causes an OOM crash (exit code 134) after 3–12 hours of normal operation. This is not triggered by any specific request — it occurs during routine inbox/outbox message queue processing.

Environment

  • Hollo version: 0.8.0-dev.290
  • Runtime: Docker (linux/arm64)
  • Node.js options: --max-old-space-size=1536
  • Container memory limit: 2 GB
  • Replicas: 2 (both exhibit the same behavior)

Symptoms

The V8 heap grows steadily over time until it reaches the --max-old-space-size limit, at which point Mark-Compact GC fails to reclaim enough memory and the process is killed with SIGABRT:

<--- Last few GCs --->

[78:0xffff79760000] 29322263 ms: Mark-Compact 1493.0 (1555.0) -> 1477.9 (1552.1) MB, pooled: 2 MB, 2819.41 / 0.34 ms  (average mu = 0.251, current mu = 0.186) task; scavenge might not succeed
[78:0xffff79760000] 29324463 ms: Mark-Compact 1491.8 (1552.9) -> 1478.1 (1552.9) MB, pooled: 1 MB, 1695.06 / 0.12 ms  (average mu = 0.243, current mu = 0.229) task; scavenge might not succeed

FATAL ERROR: Ineffective mark-compacts near heap limit Allocation failed - JavaScript heap out of memory
 ELIFECYCLE  Command failed with exit code 134.

Observed crash times

Worker Uptime at crash Heap usage at crash
worker-2 ~11.9 hours 1481 MB / 1536 MB
worker-1 ~2.7 hours 1442 MB / 1536 MB
worker-1 ~7 hours 1307 MB / 1536 MB
worker-2 ~8.1 hours 1478 MB / 1536 MB

Notes

  • This is distinct from OOM when accessing my profile page #207, which was a sudden OOM caused by the search API (fixed in v0.6.8). This issue is a gradual leak during normal worker operation.
  • Increasing --max-old-space-size only delays the inevitable crash; it does not prevent it.
  • As a workaround, we have configured the restart policy to always restart the worker on failure, so it recovers automatically within seconds after each OOM crash.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions