Improve DLPack support for external tensor consumption#6261
Improve DLPack support for external tensor consumption#6261JanuszL wants to merge 1 commit intoNVIDIA:mainfrom
Conversation
895265b to
e432f38
Compare
|
!build |
|
CI MESSAGE: [46421794]: BUILD STARTED |
Greptile SummaryThis PR improves DLPack support in DALI's experimental dynamic API by adding three fast paths and one fallback: (1) a C++ bulk Key points:
Confidence Score: 3/5
Important Files Changed
Flowchart%%{init: {'theme': 'neutral'}}%%
flowchart TD
A[Batch.__init__ with list of tensors] --> B{Materialise to list}
B --> C{dtype is None AND first element\nis ndd.Tensor with _storage?}
C -- Yes --> D[Native DALI fast path\nBuild TensorList from _storage objects]
D --> E{Construction\nsucceeded?}
E -- Yes --> Z[fast_path_used = True]
E -- No: TypeError or RuntimeError --> F[Fall through]
C -- No --> F
F --> G{dtype is None AND first element\nhas __dlpack_device__ AND is GPU type 2?}
G -- Yes --> H[DLPack GPU fast path\nTensorListGPU C++ bulk constructor]
H --> I{TypeError?}
I -- No --> Z
I -- Yes --> J[Fall through]
G -- No --> J
J --> K[Slow path: wrap each element\nin Tensor one-by-one]
K --> Z
Z --> L[Batch object ready]
subgraph CPP [C++ TensorListFromListOfDLPackObjects]
M[For each object] --> N{Has __dlpack__?}
N -- No --> O[Raise TypeError]
N -- Yes --> P[Call __dlpack__ with stream keyword arg]
P --> Q[FillTensorFromDlPack]
Q --> R{i==0?}
R -- Yes --> S[SetupLike, record copy_order, set expected_device_id]
R -- No --> T{device_id matches?}
T -- No --> U[Raise ValueError]
T -- Yes --> V{type matches?}
V -- No --> W[Raise TypeError]
V -- Yes --> X[SetSample i]
S --> X
end
H -.->|invokes| CPP
Last reviewed commit: "Improve DLPack suppo..." |
e432f38 to
4b868b5
Compare
|
@greptileai - can you re-review? |
d1c2614 to
c9793e7
Compare
|
!build |
|
CI MESSAGE: [46425910]: BUILD STARTED |
c9793e7 to
50843aa
Compare
|
@greptileai - can you rereview? |
|
CI MESSAGE: [46425910]: BUILD PASSED |
fbb02c6 to
61611fe
Compare
- Adds C++ bulk DLPack constructor for TensorListGPU: accepts a Python
list of DLPack-compatible objects (e.g. PyTorch GPU tensors) and
builds the TensorList in a single pass, recording a CUDA event on
the provided stream. Passes `stream` as a keyword argument to
`__dlpack__()` for compatibility with NumPy ≥ 1.22 and JAX which
define `def __dlpack__(self, *, stream=None)`.
- Adds native DALI fast path in Batch.__init__: when given a list of
already-evaluated ndd.Tensor objects, pass their _storage objects
directly to TensorListGPU/CPU constructors, preserving all DALI
metadata (layout, enum types) without going through DLPack.
- Adds GPU DLPack fast path in Batch.__init__: when given a list of
external GPU tensors (e.g. PyTorch) that support DLPack, use the
new C++ bulk constructor to avoid per-tensor Python overhead.
- Adds DLPack fallback for CPU read-only arrays in Tensor.__init__:
catch BufferError from __dlpack__() and fall back to __array__
interface. This makes the following work:
arr = np.array([1, 2, 3])
arr.flags.writeable = False
ndd.as_tensor(arr) # previously raised BufferError
Signed-off-by: Janusz Lisiecki <jlisiecki@nvidia.com>
61611fe to
b7a90c8
Compare
|
!build |
|
CI MESSAGE: [46436927]: BUILD STARTED |
|
@greptileai - can you rereview? |
|
CI MESSAGE: [46436927]: BUILD PASSED |
Adds C++ bulk DLPack constructor for TensorListGPU: accepts a Python
list of DLPack-compatible objects (e.g. PyTorch GPU tensors) and
builds the TensorList in a single pass, recording a CUDA event on
the provided stream. Passes
streamas a keyword argument to__dlpack__()for compatibility with NumPy ≥ 1.22 and JAX whichdefine
def __dlpack__(self, *, stream=None).Adds native DALI fast path in Batch.init: when given a list of
already-evaluated ndd.Tensor objects, pass their _storage objects
directly to TensorListGPU/CPU constructors, preserving all DALI
metadata (layout, enum types) without going through DLPack.
Adds GPU DLPack fast path in Batch.init: when given a list of
external GPU tensors (e.g. PyTorch) that support DLPack, use the
new C++ bulk constructor to avoid per-tensor Python overhead.
Adds DLPack fallback for CPU read-only arrays in Tensor.init:
catch BufferError from dlpack() and fall back to array
interface. This makes the following work:
arr = np.array([1, 2, 3])
arr.flags.writeable = False
ndd.as_tensor(arr) # previously raised BufferError
Category:
Description:
Adds C++ bulk DLPack constructor for TensorListGPU: accepts a Python
list of DLPack-compatible objects (e.g. PyTorch GPU tensors) and
builds the TensorList in a single pass, recording a CUDA event on
the provided stream. Passes
streamas a keyword argument to__dlpack__()for compatibility with NumPy ≥ 1.22 and JAX whichdefine
def __dlpack__(self, *, stream=None).Adds native DALI fast path in Batch.init: when given a list of
already-evaluated ndd.Tensor objects, pass their _storage objects
directly to TensorListGPU/CPU constructors, preserving all DALI
metadata (layout, enum types) without going through DLPack.
Adds GPU DLPack fast path in Batch.init: when given a list of
external GPU tensors (e.g. PyTorch) that support DLPack, use the
new C++ bulk constructor to avoid per-tensor Python overhead.
Adds DLPack fallback for CPU read-only arrays in Tensor.init:
catch BufferError from dlpack() and fall back to array
interface. This makes the following work:
arr = np.array([1, 2, 3])
arr.flags.writeable = False
ndd.as_tensor(arr) # previously raised BufferError
Additional information:
Affected modules and functionalities:
Key points relevant for the review:
Tests:
-extended test_tensor.py
Checklist
Documentation
DALI team only
Requirements
REQ IDs: N/A
JIRA TASK: DALI-4580