Skip to content

[Bug] Excessive Thread Growth in Doris FE (ScheduledThreadPoolExecutor) #60108

@beauli

Description

@beauli

Search before asking

  • I had searched in the issues and found no similar issues.

Version

4.0.2

What's Wrong?

This bug happens after upgrade doris from 4.0.0 to 4.0.2.
We observed abnormal thread growth in Doris FE. The number of threads continuously increases until it reaches more than 130,000, which eventually exhausts system memory and leads to os::commit_memory failed errors.

Most of these threads are named sdk-ScheduledExecutor-* and are in WAITING (parking) state. They are created by ScheduledThreadPoolExecutor and remain idle, waiting on DelayedWorkQueue.take().

What You Expected?

Thread pools should be reused and limited in size.

Idle ScheduledThreadPoolExecutor threads should not grow indefinitely.

How to Reproduce?

Start Doris FE with JDK 17.

Monitor thread count using jstack or jcmd.

Observe continuous growth of threads named sdk-ScheduledExecutor-*.

Eventually, memory usage exceeds physical RAM and Doris FE crashes.

Anything Else?

Example thread dump:
"sdk-ScheduledExecutor-3296-3" #13890 daemon prio=5 os_prio=0 cpu=0.12ms elapsed=278.37s tid=0x00007fa6055380b0 nid=0x3e51 waiting on condition [0x00007fa2959d2000]
java.lang.Thread.State: WAITING (parking)
at jdk.internal.misc.Unsafe.park(java.base@17.0.2/Native Method)
- parking to wait for <0x0000000213a14f60> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
at java.util.concurrent.locks.LockSupport.park(java.base@17.0.2/LockSupport.java:341)
at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionNode.block(java.base@17.0.2/AbstractQueuedSynchronizer.java:506)
at java.util.concurrent.ForkJoinPool.unmanagedBlock(java.base@17.0.2/ForkJoinPool.java:3463)
at java.util.concurrent.ForkJoinPool.managedBlock(java.base@17.0.2/ForkJoinPool.java:3434)
at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(java.base@17.0.2/AbstractQueuedSynchronizer.java:1623)
at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(java.base@17.0.2/ScheduledThreadPoolExecutor.java:1177)
at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(java.base@17.0.2/ScheduledThreadPoolExecutor.java:899)
at java.util.concurrent.ThreadPoolExecutor.getTask(java.base@17.0.2/ThreadPoolExecutor.java:1062)
at java.util.concurrent.ThreadPoolExecutor.runWorker(java.base@17.0.2/ThreadPoolExecutor.java:1122)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(java.base@17.0.2/ThreadPoolExecutor.java:635)
at java.lang.Thread.run(java.base@17.0.2/Thread.java:833)

Are you willing to submit PR?

  • Yes I am willing to submit a PR!

Code of Conduct

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions