-
Notifications
You must be signed in to change notification settings - Fork 2k
[https://nvbugs/5829097][fix] Re-init TRTLLM sampler to use sample stream in multi-stream cases. (cherry-pick from https://github.com/NVIDIA/TensorRT-LLM/pull/10918) #10971
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: release/1.2
Are you sure you want to change the base?
Conversation
…ream in multi-stream cases. Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>
|
/bot run --disable-fail-fast --add-multi-gpu-test |
|
PR_Github #33436 [ run ] triggered by Bot. Commit: |
📝 WalkthroughWalkthroughThis change adds multi-stream sampling support to PyExecutor by introducing the pp_multi_stream_sample configuration flag, importing TRTLLMSampler, and creating a dedicated CUDA stream with event synchronization for asynchronous sampling operations when pipeline parallelism is enabled. Changes
Sequence Diagram(s)sequenceDiagram
participant Forward as Forward Pass<br/>(Main Stream)
participant Events as Event<br/>Synchronization
participant SampleStream as Sample Stream
participant Sampler as TRTLLMSampler
Note over Forward,Sampler: Multi-Stream Sampling Flow (PP mode, enabled)
Forward->>Forward: Execute forward computation
Forward->>Events: Record start_sample_event
Forward->>Events: Signal sampling can begin
par Forward and Sampling in Parallel
Forward->>Forward: Continue next iteration
and
SampleStream->>SampleStream: Wait for start_sample_event
SampleStream->>SampleStream: Clone batch_outputs
SampleStream->>Sampler: Sample on dedicated stream
Sampler->>Sampler: Generate next tokens
SampleStream->>Events: Record finish_sample_event
end
Forward->>Events: Wait for finish_sample_event
Forward->>Forward: Use sampled outputs
Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes Possibly related PRs
Suggested reviewers
🚥 Pre-merge checks | ✅ 1 | ❌ 2❌ Failed checks (2 warnings)
✅ Passed checks (1 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (3)
tensorrt_llm/_torch/pyexecutor/py_executor.py (3)
1-1: Add NVIDIA copyright header (latest year).
This file lacks the required NVIDIA copyright header for TensorRT-LLM sources; please add/update it to reflect the latest meaningful modification year. As per coding guidelines, please add the standard header at the top of the file.
1107-1124: Gate multi-stream sampling to TRTLLMSampler for consistency with initialization.The sampler re-initialization at line 260 is correctly gated to
isinstance(self.sampler, TRTLLMSampler), but the multi-stream sampling execution at line 1107 is only checked againstpp_multi_stream_sample. This creates a mismatch: if a non-async sampler (e.g.,EarlyStopSampler) is used withpp_multi_stream_sample=true, the code will attempt stream operations without confirming the sampler is stream-safe. Add the isinstance check:- if self.pp_multi_stream_sample: + if self.pp_multi_stream_sample and isinstance(self.sampler, TRTLLMSampler):
1111-1115: Guardbatch_outputs_copycloning for non-tensor values.The output dict from
model_engine.forward()can contain non-tensor values:None(e.g.,'logits': Nonein mm_encoder_only mode), lists of tensors, or lists of scalars. Calling.clone()on these will raiseAttributeError. Clone only tensor values.🛠️ Suggested fix
- batch_outputs_copy = { - name: tensor.clone() - for name, tensor in batch_outputs.items() - } + batch_outputs_copy = { + name: tensor.clone() if torch.is_tensor(tensor) else tensor + for name, tensor in batch_outputs.items() + }
🧹 Nitpick comments (1)
tensorrt_llm/_torch/pyexecutor/py_executor.py (1)
56-57: Use module namespace for sampler imports.
Project guidelines require preserving module namespaces (avoidfrom x import y). Consider importing the module and referencingsampler.TRTLLMSampler/sampler.Sampler, etc. As per coding guidelines, please keep namespace imports.
|
PR_Github #33436 [ run ] completed with state |
Summary by CodeRabbit
✏️ Tip: You can customize this high-level summary in your review settings.
Description
Test Coverage
PR Checklist
Please review the following before submitting your PR:
PR description clearly explains what and why. If using CodeRabbit's summary, please make sure it makes sense.
PR Follows TRT-LLM CODING GUIDELINES to the best of your knowledge.
Test cases are provided for new code paths (see test instructions)
Any new dependencies have been scanned for license and vulnerabilities
CODEOWNERS updated if ownership changes
Documentation updated as needed
Update tava architecture diagram if there is a significant design change in PR.
The reviewers assigned automatically/manually are appropriate for the PR.
Please check this after reviewing the above items as appropriate for this PR.
GitHub Bot Help
/bot [-h] ['run', 'kill', 'skip', 'reuse-pipeline'] ...Provide a user friendly way for developers to interact with a Jenkins server.
Run
/bot [-h|--help]to print this help message.See details below for each supported subcommand.
Details
run [--reuse-test (optional)pipeline-id --disable-fail-fast --skip-test --stage-list "A10-PyTorch-1, xxx" --gpu-type "A30, H100_PCIe" --test-backend "pytorch, cpp" --add-multi-gpu-test --only-multi-gpu-test --disable-multi-gpu-test --post-merge --extra-stage "H100_PCIe-TensorRT-Post-Merge-1, xxx" --detailed-log --debug(experimental)]Launch build/test pipelines. All previously running jobs will be killed.
--reuse-test (optional)pipeline-id(OPTIONAL) : Allow the new pipeline to reuse build artifacts and skip successful test stages from a specified pipeline or the last pipeline if no pipeline-id is indicated. If the Git commit ID has changed, this option will be always ignored. The DEFAULT behavior of the bot is to reuse build artifacts and successful test results from the last pipeline.--disable-reuse-test(OPTIONAL) : Explicitly prevent the pipeline from reusing build artifacts and skipping successful test stages from a previous pipeline. Ensure that all builds and tests are run regardless of previous successes.--disable-fail-fast(OPTIONAL) : Disable fail fast on build/tests/infra failures.--skip-test(OPTIONAL) : Skip all test stages, but still run build stages, package stages and sanity check stages. Note: Does NOT update GitHub check status.--stage-list "A10-PyTorch-1, xxx"(OPTIONAL) : Only run the specified test stages. Examples: "A10-PyTorch-1, xxx". Note: Does NOT update GitHub check status.--gpu-type "A30, H100_PCIe"(OPTIONAL) : Only run the test stages on the specified GPU types. Examples: "A30, H100_PCIe". Note: Does NOT update GitHub check status.--test-backend "pytorch, cpp"(OPTIONAL) : Skip test stages which don't match the specified backends. Only support [pytorch, cpp, tensorrt, triton]. Examples: "pytorch, cpp" (does not run test stages with tensorrt or triton backend). Note: Does NOT update GitHub pipeline status.--only-multi-gpu-test(OPTIONAL) : Only run the multi-GPU tests. Note: Does NOT update GitHub check status.--disable-multi-gpu-test(OPTIONAL) : Disable the multi-GPU tests. Note: Does NOT update GitHub check status.--add-multi-gpu-test(OPTIONAL) : Force run the multi-GPU tests in addition to running L0 pre-merge pipeline.--post-merge(OPTIONAL) : Run the L0 post-merge pipeline instead of the ordinary L0 pre-merge pipeline.--extra-stage "H100_PCIe-TensorRT-Post-Merge-1, xxx"(OPTIONAL) : Run the ordinary L0 pre-merge pipeline and specified test stages. Examples: --extra-stage "H100_PCIe-TensorRT-Post-Merge-1, xxx".--detailed-log(OPTIONAL) : Enable flushing out all logs to the Jenkins console. This will significantly increase the log volume and may slow down the job.--debug(OPTIONAL) : Experimental feature. Enable access to the CI container for debugging purpose. Note: Specify exactly one stage in thestage-listparameter to access the appropriate container environment. Note: Does NOT update GitHub check status.For guidance on mapping tests to stage names, see
docs/source/reference/ci-overview.mdand the
scripts/test_to_stage_mapping.pyhelper.kill
killKill all running builds associated with pull request.
skip
skip --comment COMMENTSkip testing for latest commit on pull request.
--comment "Reason for skipping build/test"is required. IMPORTANT NOTE: This is dangerous since lack of user care and validation can cause top of tree to break.reuse-pipeline
reuse-pipelineReuse a previous pipeline to validate current commit. This action will also kill all currently running builds associated with the pull request. IMPORTANT NOTE: This is dangerous since lack of user care and validation can cause top of tree to break.