-
Notifications
You must be signed in to change notification settings - Fork 1.1k
Description
System Info
Environment
- Node.js Version: [Specify version, e.g., v18.x.x]
- OS: [macOS/Windows/Linux]
- Dependencies:
- @xenova/transformers
- sharp
- fs (built-in)
- Configuration:
- Batch size: 50
- Concurrency: 20
- Pre-resize: Disabled
- Cache directory: .cache
Additional Context
The POC script includes comprehensive memory monitoring with:
- Per-batch memory snapshots (before/after processing)
- Automatic leak detection based on growth thresholds
- Detailed logging of heap usage, RSS, external memory, and array buffers
Key findings from the leak analysis:
- Memory growth appears to accumulate with each batch
- Even with pre-resizing enabled, memory is not being fully reclaimed
- The issue may be related to:
- CLIP model caching
- Sharp image processing buffers
- Tensor/vector data retention in @xenova/transformers
Environment/Platform
- Website/web-app
- Browser extension
- Server-side (e.g., Node.js, Deno, Bun)
- Desktop app (e.g., Electron)
- Other (e.g., VSCode extension)
Description
Summarized conversation history# Memory Leak in Image Embedding POC
Description
During testing of the image embedding POC script (poc-embedding-memory-test.mjs), a significant memory leak has been detected. The script processes batches of images using CLIP embeddings and monitors memory usage throughout the process. Analysis shows consistent memory growth that exceeds acceptable thresholds, indicating a potential memory leak in the image processing pipeline.
Steps to Reproduce
- Clone the repository and navigate to the image-grouping directory
- Ensure Node.js is installed with
--expose-gcflag support - Run the POC script:
node --expose-gc poc-embedding-memory-test.mjs
- Monitor the output for memory usage logs and the final leak analysis section
Expected Behavior
Memory usage should remain relatively stable or grow minimally between batches, with garbage collection effectively reclaiming memory from processed images and embeddings.
Actual Behavior
The script reports significant memory growth:
- Total heap growth across batches exceeds expected levels
- Average growth per batch is growing
- Memory snapshots show consistent upward trends in heap usage without adequate reclamation
Environment
- Node.js Version: [Specify version, e.g., v18.x.x]
- OS: [macOS/Windows/Linux]
- Dependencies:
- @xenova/transformers
- sharp
- fs (built-in)
- Configuration:
- Batch size: 50
- Concurrency: 20
- Pre-resize: Disabled
- Cache directory: .cache
Additional Context
The POC script includes comprehensive memory monitoring with:
- Per-batch memory snapshots (before/after processing)
- Automatic leak detection based on growth thresholds
- Detailed logging of heap usage, RSS, external memory, and array buffers
Key findings from the leak analysis:
- Memory growth appears to accumulate with each batch
- Even with pre-resizing enabled, memory is not being fully reclaimed
- The issue may be related to:
- CLIP model caching
- Sharp image processing buffers
- Tensor/vector data retention in @xenova/transformers
Possible Solutions
- Force garbage collection between batches (currently disabled in POC)
- Clear model cache more aggressively
- Optimize image processing to reduce buffer retention
- Implement streaming processing instead of batch accumulation
- Profile with heap snapshots to identify specific leak sources
Files
poc-embedding-memory-test.mjs- The POC script demonstrating the memory leak- Memory logs and analysis output included in script execution
Labels
- bug
- memory-leak
- performance
- investigation-needed
Reproduction
POC:
ENABLE_RESIZE=false BATCH_SIZE=50 CONCURRENCY=10 node --expose-gc poc-embedding-memory-test.mjs 2>&1