Add native multi-GPU device_map support to TensorDeserializer#201
Draft
Add native multi-GPU device_map support to TensorDeserializer#201
Conversation
…ading Add native multi-GPU support to TensorDeserializer via a new device_map keyword argument. Supports explicit per-tensor placement via a Mapping and automatic greedy largest-first balancing via a Sequence of devices. - Greedy balancer uses deserialized_length (bytes) with a min-heap - Per-device CUDA streams cached in copy threads - CPU tensors in mixed maps get dedicated buffers to avoid corruption - Underspecified CUDA devices resolved up front - read_numpy_arrays guarded against non-CPU device_map targets - Default path (device_map=None) avoids per-tensor overhead in hot loop - 19 new unit tests covering placement, balancing, fallback, edge cases
Author
|
This change is part of the following stack: Change managed by git-spice. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Add native multi-GPU device_map support to TensorDeserializer
feat(serialization): add device_map parameter for multi-GPU tensor loading
Add native multi-GPU support to TensorDeserializer via a new device_map
keyword argument. Supports explicit per-tensor placement via a Mapping
and automatic greedy largest-first balancing via a Sequence of devices.