Add buffer caching to no_gpu CPU allocator by dhiltgen · Pull Request #3554 · ml-explore/mlx

dhiltgen · 2026-05-16T15:58:54Z

Proposed changes

Split out from #3019

Integrate BufferCache into the CPU allocator to enable memory reuse for CPU-only builds. Previously the no_gpu allocator called malloc/free on every allocation with no caching, while the Metal and CUDA backends had buffer caching for better performance.

Track cached buffers by their physical capacity when they are reused so get_cache_memory(), active memory, and cache limit enforcement continue to reflect retained memory. Add a regression test for reusing a larger cached block for a smaller request.

Changes:

Add CpuCachedBuffer struct with intrusive freelist for object pooling
Use BufferCache to recycle freed buffers with a 32MB default cache limit
Preserve cached block capacity across reuse and avoid caching zero-size allocations
Implement get_cache_memory(), set_cache_limit(), clear_cache() (were no-ops)
Cache-first allocation path with fallback to OS malloc on cache miss

Checklist

Put an x in the boxes that apply.

I have read the CONTRIBUTING document
I have run pre-commit run --all-files to format my code / installed pre-commit prior to committing changes
I have added tests that prove my fix is effective or that my feature works
I have updated the necessary documentation (if needed)

Integrate BufferCache into the CPU allocator to enable memory reuse for CPU-only builds. Previously the no_gpu allocator called malloc/free on every allocation with no caching, while the Metal and CUDA backends had buffer caching for better performance. Track cached buffers by their physical capacity when they are reused so get_cache_memory(), active memory, and cache limit enforcement continue to reflect retained memory. Add a regression test for reusing a larger cached block for a smaller request. Changes: - Add CpuCachedBuffer struct with intrusive freelist for object pooling - Use BufferCache to recycle freed buffers with a 32MB default cache limit - Preserve cached block capacity across reuse and avoid caching zero-size allocations - Implement get_cache_memory(), set_cache_limit(), clear_cache() (were no-ops) - Cache-first allocation path with fallback to OS malloc on cache miss

zcbenz

Looks good to me, thanks!

zcbenz approved these changes May 17, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add buffer caching to no_gpu CPU allocator#3554

Add buffer caching to no_gpu CPU allocator#3554
dhiltgen wants to merge 1 commit into
ml-explore:mainfrom
dhiltgen:pr/allocator-cache

dhiltgen commented May 16, 2026

Uh oh!

zcbenz left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

dhiltgen commented May 16, 2026

Proposed changes

Checklist

Uh oh!

zcbenz left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants