Skip to content

bug: Thread leak in searcher._retrieve_paths causes container thread exhaustion #1273

@brentkearney

Description

@brentkearney

Pre-submission checklist | 提交前检查

  • I have searched existing issues and this hasn't been mentioned before | 我已搜索现有问题,确认此问题尚未被提及
  • I have read the project documentation and confirmed this issue doesn't already exist | 我已阅读项目文档并确认此问题尚未存在
  • This issue is specific to MemOS and not a general software issue | 该问题是针对 MemOS 的,而不是一般软件问题

Bug Description | 问题描述

The search pipeline creates a new ContextThreadPoolExecutor on every request in four methods within searcher.py:

  • _retrieve_paths (line 357, max_workers=5)
  • _retrieve_from_long_term_and_user (line 638, max_workers=3)
  • _retrieve_from_tool_memory (line 791, max_workers=2)
  • _deduplicate_rawfile_results (line 1153, up to max_workers=10)

Each executor is used within a with block, which calls shutdown(wait=True) on exit. If any submitted task hangs — e.g., a slow Neo4j query, an unresponsive embedding API, or a network timeout — the executor never shuts down. The threads remain alive, and the next request creates another pool.

Over time, this causes unbounded thread accumulation. In our deployment we observed 8,744 threads in the memos-api container, at which point:

  • /search returns HTTP 200 with empty results (the can't start new thread error is caught silently)
  • /chat returns HTTP 503 (it calls search internally but doesn't handle the thread error gracefully)
  • Even docker exec fails — OpenBLAS cannot create pthreads

Note: Adding timeout to future.result() alone does not fix this. The timeout only skips waiting for the result — the thread itself keeps running, and shutdown(wait=True) still blocks until it finishes. The threads still accumulate.

How to Reproduce | 如何重现

  1. Deploy MemOS with Neo4j backend in Docker
  2. Send sustained /search traffic over hours/days
  3. If any downstream dependency (Neo4j, embedding API) experiences intermittent slowness, threads accumulate
  4. Monitor with: docker exec <container> cat /proc/1/status | grep Threads

Environment | 环境信息

  • Python version: 3.11
  • Operating System: Linux (Raspberry Pi / aarch64)
  • MemOS version: v2.0.8 (also present in v2.0.9 — searcher.py unchanged)
  • Backend: Neo4j
  • Deployment: Docker

Additional Context | 其他信息

Suggested Fix:

Use a shared, class-level ContextThreadPoolExecutor instead of creating a new one per request. The Searcher class already follows this pattern with _usage_executor (line 73). Adding a second shared pool for search operations would bound thread count regardless of request volume or downstream latency:

# In __init__:
self._search_executor = ContextThreadPoolExecutor(max_workers=10, thread_name_prefix="search")

# In each method, replace:
#   with ContextThreadPoolExecutor(max_workers=N) as executor:
# with:
#   executor = self._search_executor

All future.result() calls should also include a timeout (e.g., 30s) as a safety measure.

Workarounds:

  • Set pids_limit in docker-compose.yml to fail fast instead of consuming all system resources
  • Set OPENBLAS_NUM_THREADS=1 to reduce per-thread overhead from numpy

Willingness to Implement | 实现意愿

  • I'm willing to implement this myself | 我愿意自己解决
  • I would like someone else to implement this | 我希望其他人来解决

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions