Defer DLTensor deletion when CUDA graph capture is active.#6259
Defer DLTensor deletion when CUDA graph capture is active.#6259JanuszL wants to merge 2 commits intoNVIDIA:mainfrom
Conversation
- cudaDeviceSynchronize returns cudaErrorStreamCaptureUnsupported when any stream on the device is being captured. DLTensorGraveyard now puts pending deletions back in the queue and retries after a short wait instead of propagating the error. Signed-off-by: Janusz Lisiecki <jlisiecki@nvidia.com>
|
!build |
Greptile SummaryThis PR fixes a crash/error propagation that occurred when
Confidence Score: 4/5
Important Files Changed
Flowchart%%{init: {'theme': 'neutral'}}%%
flowchart TD
A[Worker thread wakes\nlock held] --> B["cv_.wait: pending_ non-empty?"]
B -->|yes| C["list_t tmp = move(pending_)\nlock.unlock()"]
C --> D[cudaDeviceSynchronize]
D -->|cudaErrorCudartUnloading| E[break — process shutting down]
D -->|cudaErrorStreamCaptureUnsupported| F["lock.lock()\nsplice tmp → pending_.begin()"]
F --> G["cv_.wait_for 1ms\nexit_requested_ OR !pending_.empty()"]
G --> H["continue → top of loop\n(pending_ is non-empty, wait returns immediately)"]
H --> B
D -->|other error| I["CUDA_CALL — throws"]
D -->|success| J["tmp.clear()\nlock.lock()\nnext iteration"]
J --> B
Last reviewed commit: ca3397e |
| cv_.wait_for(lock, std::chrono::milliseconds(1), [&]() { | ||
| return exit_requested_; | ||
| }); |
There was a problem hiding this comment.
Retry predicate misses new-enqueue wake-ups
The cv_.wait_for predicate only checks exit_requested_, so a cv_.notify_one() fired from enqueue while the worker is sleeping here will wake it up, check the predicate (false), and then go back to sleep for the remainder of the 1 ms window. The practical impact is negligible (1 ms is very short), but if new tensors arrive just after the capture starts and the predicate included !pending_.empty(), the retry could also serve as an ordinary processing wake-up point. Consider:
| cv_.wait_for(lock, std::chrono::milliseconds(1), [&]() { | |
| return exit_requested_; | |
| }); | |
| cv_.wait_for(lock, std::chrono::milliseconds(1), [&]() { | |
| return exit_requested_ || !pending_.empty(); | |
| }); |
This is a minor quality-of-life improvement, not a bug.
|
CI MESSAGE: [46341167]: BUILD STARTED |
Signed-off-by: Janusz Lisiecki <jlisiecki@nvidia.com>
|
!build |
|
CI MESSAGE: [46341668]: BUILD STARTED |
|
CI MESSAGE: [46341668]: BUILD PASSED |
|
CI MESSAGE: [46351401]: BUILD STARTED |
|
CI MESSAGE: [46351401]: BUILD PASSED |
stream on the device is being captured. DLTensorGraveyard now puts pending
deletions back in the queue and retries after a short wait instead of
propagating the error.
Category:
Bug fix (non-breaking change which fixes an issue)
Description:
stream on the device is being captured. DLTensorGraveyard now puts pending
deletions back in the queue and retries after a short wait instead of
propagating the error.
Additional information:
Affected modules and functionalities:
Key points relevant for the review:
Tests:
Checklist
Documentation
DALI team only
Requirements
REQ IDs: N/A
JIRA TASK: N/A