-
Notifications
You must be signed in to change notification settings - Fork 684
Description
Pre-submission checklist | 提交前检查
- I have searched existing issues and this hasn't been mentioned before | 我已搜索现有问题,确认此问题尚未被提及
- I have read the project documentation and confirmed this issue doesn't already exist | 我已阅读项目文档并确认此问题尚未存在
- This issue is specific to MemOS and not a general software issue | 该问题是针对 MemOS 的,而不是一般软件问题
Bug Description | 问题描述
The search pipeline creates a new ContextThreadPoolExecutor on every request in four methods within searcher.py:
_retrieve_paths(line 357,max_workers=5)_retrieve_from_long_term_and_user(line 638,max_workers=3)_retrieve_from_tool_memory(line 791,max_workers=2)_deduplicate_rawfile_results(line 1153, up tomax_workers=10)
Each executor is used within a with block, which calls shutdown(wait=True) on exit. If any submitted task hangs — e.g., a slow Neo4j query, an unresponsive embedding API, or a network timeout — the executor never shuts down. The threads remain alive, and the next request creates another pool.
Over time, this causes unbounded thread accumulation. In our deployment we observed 8,744 threads in the memos-api container, at which point:
/searchreturns HTTP 200 with empty results (thecan't start new threaderror is caught silently)/chatreturns HTTP 503 (it calls search internally but doesn't handle the thread error gracefully)- Even
docker execfails — OpenBLAS cannot create pthreads
Note: Adding timeout to future.result() alone does not fix this. The timeout only skips waiting for the result — the thread itself keeps running, and shutdown(wait=True) still blocks until it finishes. The threads still accumulate.
How to Reproduce | 如何重现
- Deploy MemOS with Neo4j backend in Docker
- Send sustained
/searchtraffic over hours/days - If any downstream dependency (Neo4j, embedding API) experiences intermittent slowness, threads accumulate
- Monitor with:
docker exec <container> cat /proc/1/status | grep Threads
Environment | 环境信息
- Python version: 3.11
- Operating System: Linux (Raspberry Pi / aarch64)
- MemOS version: v2.0.8 (also present in v2.0.9 —
searcher.pyunchanged) - Backend: Neo4j
- Deployment: Docker
Additional Context | 其他信息
Suggested Fix:
Use a shared, class-level ContextThreadPoolExecutor instead of creating a new one per request. The Searcher class already follows this pattern with _usage_executor (line 73). Adding a second shared pool for search operations would bound thread count regardless of request volume or downstream latency:
# In __init__:
self._search_executor = ContextThreadPoolExecutor(max_workers=10, thread_name_prefix="search")
# In each method, replace:
# with ContextThreadPoolExecutor(max_workers=N) as executor:
# with:
# executor = self._search_executorAll future.result() calls should also include a timeout (e.g., 30s) as a safety measure.
Workarounds:
- Set
pids_limitindocker-compose.ymlto fail fast instead of consuming all system resources - Set
OPENBLAS_NUM_THREADS=1to reduce per-thread overhead from numpy
Willingness to Implement | 实现意愿
- I'm willing to implement this myself | 我愿意自己解决
- I would like someone else to implement this | 我希望其他人来解决