Skip to content

Add buffer caching to no_gpu CPU allocator#3554

Open
dhiltgen wants to merge 1 commit into
ml-explore:mainfrom
dhiltgen:pr/allocator-cache
Open

Add buffer caching to no_gpu CPU allocator#3554
dhiltgen wants to merge 1 commit into
ml-explore:mainfrom
dhiltgen:pr/allocator-cache

Conversation

@dhiltgen
Copy link
Copy Markdown
Contributor

Proposed changes

Split out from #3019

Integrate BufferCache into the CPU allocator to enable memory reuse for CPU-only builds. Previously the no_gpu allocator called malloc/free on every allocation with no caching, while the Metal and CUDA backends had buffer caching for better performance.

Track cached buffers by their physical capacity when they are reused so get_cache_memory(), active memory, and cache limit enforcement continue to reflect retained memory. Add a regression test for reusing a larger cached block for a smaller request.

Changes:

  • Add CpuCachedBuffer struct with intrusive freelist for object pooling
  • Use BufferCache to recycle freed buffers with a 32MB default cache limit
  • Preserve cached block capacity across reuse and avoid caching zero-size allocations
  • Implement get_cache_memory(), set_cache_limit(), clear_cache() (were no-ops)
  • Cache-first allocation path with fallback to OS malloc on cache miss

Checklist

Put an x in the boxes that apply.

  • I have read the CONTRIBUTING document
  • I have run pre-commit run --all-files to format my code / installed pre-commit prior to committing changes
  • I have added tests that prove my fix is effective or that my feature works
  • I have updated the necessary documentation (if needed)

Integrate BufferCache into the CPU allocator to enable memory reuse for CPU-only builds. Previously the no_gpu allocator called malloc/free on every allocation with no caching, while the Metal and CUDA backends had buffer caching for better performance.

Track cached buffers by their physical capacity when they are reused so get_cache_memory(), active memory, and cache limit enforcement continue to reflect retained memory. Add a regression test for reusing a larger cached block for a smaller request.

Changes:
- Add CpuCachedBuffer struct with intrusive freelist for object pooling
- Use BufferCache to recycle freed buffers with a 32MB default cache limit
- Preserve cached block capacity across reuse and avoid caching zero-size allocations
- Implement get_cache_memory(), set_cache_limit(), clear_cache() (were no-ops)
- Cache-first allocation path with fallback to OS malloc on cache miss
Copy link
Copy Markdown
Collaborator

@zcbenz zcbenz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me, thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants