Skip to content

[TRTLLM-12631][infra] Split some long stages#14035

Open
EmmaQiaoCh wants to merge 2 commits into
NVIDIA:mainfrom
EmmaQiaoCh:emma/split_long_stages
Open

[TRTLLM-12631][infra] Split some long stages#14035
EmmaQiaoCh wants to merge 2 commits into
NVIDIA:mainfrom
EmmaQiaoCh:emma/split_long_stages

Conversation

@EmmaQiaoCh
Copy link
Copy Markdown
Collaborator

@EmmaQiaoCh EmmaQiaoCh commented May 12, 2026

Summary by CodeRabbit

  • Chores
    • Enhanced test coverage for GB200/B200 and GB300 hardware variants by expanding PyTorch test parallelization and adding additional post-merge validation groups for improved reliability.

Review Change Stack

Description

Test Coverage

PR Checklist

Please review the following before submitting your PR:

  • PR description clearly explains what and why. If using CodeRabbit's summary, please make sure it makes sense.

  • PR Follows TRT-LLM CODING GUIDELINES to the best of your knowledge.

  • Test cases are provided for new code paths (see test instructions)

  • Any new dependencies have been scanned for license and vulnerabilities

  • CODEOWNERS updated if ownership changes

  • Documentation updated as needed

  • Update tava architecture diagram if there is a significant design change in PR.

  • The reviewers assigned automatically/manually are appropriate for the PR.

  • Please check this after reviewing the above items as appropriate for this PR.

GitHub Bot Help

To see a list of available CI bot commands, please comment /bot help.

Signed-off-by: EmmaQiaoCh <qqiao@nvidia.com>
@EmmaQiaoCh EmmaQiaoCh requested review from a team as code owners May 12, 2026 06:01
@EmmaQiaoCh EmmaQiaoCh requested review from dpitman-nvda and niukuo May 12, 2026 06:01
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 12, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 5986f692-a562-4842-a2f7-d2f9f5c30c3e

📥 Commits

Reviewing files that changed from the base of the PR and between 42f1c7c and 2fa0c09.

📒 Files selected for processing (1)
  • jenkins/L0_Test.groovy

📝 Walkthrough

Walkthrough

Jenkins test-stage configurations in launchTestJobs() are expanded to support additional GPU variant coverage. DGX B200 PyTorch stages increase from 3 to 5 splits, GB300 post-merge stages extend to 2 groups, and GB300 perf-sanity post-merge stages expand from 1 to 2 splits.

Changes

Test Configuration Expansion

Layer / File(s) Summary
GPU test stage splits expansion
jenkins/L0_Test.groovy
B200 PyTorch stages extended from 3 to 5 splits with new PyTorch-4 and PyTorch-5 entries; GB300 post-merge coverage doubled to support Post-Merge-2; GB300 perf-sanity post-merge stages added PerfSanity-Post-Merge-2 with updated split counts.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Description check ⚠️ Warning The PR description contains only the template with unfilled placeholders. No concrete description, test coverage information, or implementation details were provided by the author. Fill in the Description section explaining why stages are being split, complete the Test Coverage section, and confirm all relevant checklist items have been addressed.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and specifically describes the main change: splitting long test stages in the Jenkins configuration for GB200/B200 and GB300 variants.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Tip

💬 Introducing Slack Agent: The best way for teams to turn conversations into code.

Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.

  • Generate code and open pull requests
  • Plan features and break down work
  • Investigate incidents and troubleshoot customer tickets together
  • Automate recurring tasks and respond to alerts with triggers
  • Summarize progress and report instantly

Built for teams:

  • Shared memory across your entire org—no repeating context
  • Per-thread sandboxes to safely plan and execute work
  • Governance built-in—scoped access, auditability, and budget controls

One agent for your entire SDLC. Right inside Slack.

👉 Get started


Comment @coderabbitai help to get the list of available commands and usage tips.

@EmmaQiaoCh
Copy link
Copy Markdown
Collaborator Author

/bot run --stage-list "DGX_B200-PyTorch-1,DGX_B200-PyTorch-2,DGX_B200-PyTorch-3,DGX_B200-PyTorch-4,DGX_B200-PyTorch-5,GB300-4_GPUs-PyTorch-Post-Merge-1,GB300-4_GPUs-PyTorch-Post-Merge-2,GB300-4_GPUs-PyTorch-PerfSanity-Post-Merge-1,GB300-4_GPUs-PyTorch-PerfSanity-Post-Merge-2" --disable-fail-fast

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #47898 [ run ] triggered by Bot. Commit: 2fa0c09 Link to invocation

Signed-off-by: EmmaQiaoCh <qqiao@nvidia.com>
@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #47898 [ run ] completed with state SUCCESS. Commit: 2fa0c09
/LLM/main/L0_MergeRequest_PR pipeline #37747 (Partly Tested) completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

CI Agent Failure Analysis

Link to invocation

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants