[TRTLLM-12631][infra] Split some long stages by EmmaQiaoCh · Pull Request #14035 · NVIDIA/TensorRT-LLM

EmmaQiaoCh · 2026-05-12T06:01:12Z

Summary by CodeRabbit

Chores
- Enhanced test coverage for GB200/B200 and GB300 hardware variants by expanding PyTorch test parallelization and adding additional post-merge validation groups for improved reliability.

Description

Test Coverage

PR Checklist

Please review the following before submitting your PR:

PR description clearly explains what and why. If using CodeRabbit's summary, please make sure it makes sense.
PR Follows TRT-LLM CODING GUIDELINES to the best of your knowledge.
Test cases are provided for new code paths (see test instructions)
Any new dependencies have been scanned for license and vulnerabilities
CODEOWNERS updated if ownership changes
Documentation updated as needed
Update tava architecture diagram if there is a significant design change in PR.
The reviewers assigned automatically/manually are appropriate for the PR.
Please check this after reviewing the above items as appropriate for this PR.

GitHub Bot Help

To see a list of available CI bot commands, please comment /bot help.

Signed-off-by: EmmaQiaoCh <qqiao@nvidia.com>

coderabbitai · 2026-05-12T06:02:39Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 5986f692-a562-4842-a2f7-d2f9f5c30c3e

📥 Commits

Reviewing files that changed from the base of the PR and between 42f1c7c and 2fa0c09.

📒 Files selected for processing (1)

jenkins/L0_Test.groovy

📝 Walkthrough

Walkthrough

Jenkins test-stage configurations in launchTestJobs() are expanded to support additional GPU variant coverage. DGX B200 PyTorch stages increase from 3 to 5 splits, GB300 post-merge stages extend to 2 groups, and GB300 perf-sanity post-merge stages expand from 1 to 2 splits.

Changes

Test Configuration Expansion

Layer / File(s)	Summary
GPU test stage splits expansion `jenkins/L0_Test.groovy`	B200 PyTorch stages extended from 3 to 5 splits with new `PyTorch-4` and `PyTorch-5` entries; GB300 post-merge coverage doubled to support `Post-Merge-2`; GB300 perf-sanity post-merge stages added `PerfSanity-Post-Merge-2` with updated split counts.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Description check	⚠️ Warning	The PR description contains only the template with unfilled placeholders. No concrete description, test coverage information, or implementation details were provided by the author.	Fill in the Description section explaining why stages are being split, complete the Test Coverage section, and confirm all relevant checklist items have been addressed.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title clearly and specifically describes the main change: splitting long test stages in the Jenkins configuration for GB200/B200 and GB300 variants.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

Tip

💬 Introducing Slack Agent: The best way for teams to turn conversations into code.

Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.

Generate code and open pull requests
Plan features and break down work
Investigate incidents and troubleshoot customer tickets together
Automate recurring tasks and respond to alerts with triggers
Summarize progress and report instantly

Built for teams:

Shared memory across your entire org—no repeating context
Per-thread sandboxes to safely plan and execute work
Governance built-in—scoped access, auditability, and budget controls

One agent for your entire SDLC. Right inside Slack.

👉 Get started

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

EmmaQiaoCh · 2026-05-12T06:04:21Z

/bot run --stage-list "DGX_B200-PyTorch-1,DGX_B200-PyTorch-2,DGX_B200-PyTorch-3,DGX_B200-PyTorch-4,DGX_B200-PyTorch-5,GB300-4_GPUs-PyTorch-Post-Merge-1,GB300-4_GPUs-PyTorch-Post-Merge-2,GB300-4_GPUs-PyTorch-PerfSanity-Post-Merge-1,GB300-4_GPUs-PyTorch-PerfSanity-Post-Merge-2" --disable-fail-fast

tensorrt-cicd · 2026-05-12T06:09:42Z

PR_Github #47898 [ run ] triggered by Bot. Commit: 2fa0c09 Link to invocation

Signed-off-by: EmmaQiaoCh <qqiao@nvidia.com>

tensorrt-cicd · 2026-05-12T10:08:29Z

PR_Github #47898 [ run ] completed with state SUCCESS. Commit: 2fa0c09
/LLM/main/L0_MergeRequest_PR pipeline #37747 (Partly Tested) completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

CI Agent Failure Analysis

Link to invocation

Split some long stages

2fa0c09

Signed-off-by: EmmaQiaoCh <qqiao@nvidia.com>

EmmaQiaoCh requested review from a team as code owners May 12, 2026 06:01

EmmaQiaoCh requested review from dpitman-nvda and niukuo May 12, 2026 06:01

github-actions Bot assigned EmmaQiaoCh May 12, 2026

yiqingy0 approved these changes May 12, 2026

View reviewed changes

Split another stage

c2ad089

Signed-off-by: EmmaQiaoCh <qqiao@nvidia.com>

mzweilz approved these changes May 12, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[TRTLLM-12631][infra] Split some long stages#14035

[TRTLLM-12631][infra] Split some long stages#14035
EmmaQiaoCh wants to merge 2 commits into
NVIDIA:mainfrom
EmmaQiaoCh:emma/split_long_stages

EmmaQiaoCh commented May 12, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented May 12, 2026

Walkthrough

Changes

Estimated code review effort

❌ Failed checks (1 warning)

Uh oh!

EmmaQiaoCh commented May 12, 2026

Uh oh!

tensorrt-cicd commented May 12, 2026

Uh oh!

tensorrt-cicd commented May 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

EmmaQiaoCh commented May 12, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Description

Test Coverage

PR Checklist

GitHub Bot Help

Uh oh!

coderabbitai Bot commented May 12, 2026

Walkthrough

Changes

Estimated code review effort

❌ Failed checks (1 warning)

Uh oh!

EmmaQiaoCh commented May 12, 2026

Uh oh!

tensorrt-cicd commented May 12, 2026

Uh oh!

tensorrt-cicd commented May 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

EmmaQiaoCh commented May 12, 2026 •

edited by coderabbitai Bot

Loading