Skip to content

[None][test] Split verl tests into 19 fine-grained per-case wrappers#14037

Open
Superjomn wants to merge 1 commit into
NVIDIA:mainfrom
Superjomn:fine-grained-verl-test
Open

[None][test] Split verl tests into 19 fine-grained per-case wrappers#14037
Superjomn wants to merge 1 commit into
NVIDIA:mainfrom
Superjomn:fine-grained-verl-test

Conversation

@Superjomn
Copy link
Copy Markdown
Collaborator

@Superjomn Superjomn commented May 12, 2026

Summary

  • Replace 3 file-level verl wrappers with 19 fine-grained per-case wrappers in tests/integration/defs/verl/test_verl_cases.py (one wrapper per upstream verl pytest case)
  • Bump verl pin: 4cda6af -> d324b01
  • Update l0_verl test list to enumerate the 19 case-level entries
  • Drop the 3 blanket nvbugs/5981833 waives that were skipping the old file-level wrappers

Why

File-level wrappers force us to skip an entire upstream test file when a single case is flaky. Per-case wrappers let us waive precisely.

Test plan

  • Local: pytest --collect-only tests/integration/defs/verl/test_verl_cases.py -> 19 collected
  • Local: 4 pure HTTP-mock adapter cases pass when run directly against a verl clone at d324b01 (test_make_async_request_{get,post,http_error,max_attempts_exceeded}_method)
  • Verified all 19 case names are unique across the 5 verl test files - no -k substring collisions
  • CI: /bot run --extra-stage "DGX_B200-4_GPUs-AutoDeploy-1" to exercise the 4xB200 l0_verl post-merge stage

Notes

The verl_setup fixture builds gdrcopy + nvshmem + DeepEP and needs root for /usr/local/lib. It runs cleanly in CI containers but cannot run on a non-root dev shell, so full validation depends on CI.

Summary by CodeRabbit

  • Tests

    • Refactored integration test suite to use individual test functions for finer-grained control and coverage.
    • Expanded test cases with additional scenarios for placement groups, async operations, multimodal generation, and inter-node rollout.
  • Chores

    • Updated test infrastructure configuration and removed obsolete test entries.

Review Change Stack

Replace the 3 file-level verl wrappers (test_async_server, test_adapter,
test_rollout_utils) with 19 wrappers that map 1-to-1 to upstream verl
pytest cases. Each wrapper shells out to `pytest <file> -k <case>` so
individual cases can be waived without skipping a whole file.

Also:
- Bump verl pin: 4cda6af -> d324b01
- Expand l0_verl test list to the 19 case-level entries
- Remove the 3 blanket waives under nvbugs/5981833

Local smoke (2xH800 NVL dev host):
- pytest --collect-only: 19 / 19 collected
- 4 pure HTTP-mock adapter cases run directly in a verl clone: 4/4 PASS
- All 19 case names verified unique across the 5 verl test files
  (no -k substring collision)

The verl_setup fixture (gdrcopy/nvshmem/DeepEP build) needs root and
will only run cleanly in CI containers; full validation depends on
/bot run on 4xB200.

Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 12, 2026

📝 Walkthrough

Walkthrough

The PR refactors the verl integration test infrastructure to enable fine-grained test waiving. Test wrapper functions now map 1-to-1 to individual pytest cases via a new _run_single helper, replacing coarse module-level wrappers. The verl repository pinning is updated, test lists expanded with all new granular cases, and obsolete waivers removed.

Changes

Verl Test Infrastructure Refactoring

Layer / File(s) Summary
Test wrapper refactoring for fine-grained waiving
tests/integration/defs/verl/test_verl_cases.py
Module docstring now documents 1-to-1 mapping between wrapper functions and individual pytest cases. Path constants are added for verl test file locations. New _run_single(verl_file, case_name, timeout) helper executes a single case via -k filtering. Three coarse wrappers (test_async_server, test_adapter, test_rollout_utils) are replaced by nineteen granular test functions covering placement groups, async generation, request methods, multimodal generation, wake/sleep cycles, inter-node rollout, and abort scenarios, with extended timeouts (900s) for slower multimodal and multi-node cases.
Configuration and test synchronization
tests/integration/defs/verl/verl_config.yml, tests/integration/test_lists/test-db/l0_verl.yml, tests/integration/test_lists/waives.txt
Verl repository is pinned to commit d324b01. Integration test list (l0_verl) is expanded from three coarse tests to nineteen granular test cases matching the new wrapper structure. Three obsolete waive entries for the previous wrapper functions are removed.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 9.09% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and concisely describes the main change: splitting verl tests from 3 file-level wrappers into 19 fine-grained per-case wrappers.
Description check ✅ Passed The description includes all required sections (Summary, Why, Test plan) with clear explanations of what changed, why, and how it was verified.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Tip

💬 Introducing Slack Agent: The best way for teams to turn conversations into code.

Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.

  • Generate code and open pull requests
  • Plan features and break down work
  • Investigate incidents and troubleshoot customer tickets together
  • Automate recurring tasks and respond to alerts with triggers
  • Summarize progress and report instantly

Built for teams:

  • Shared memory across your entire org—no repeating context
  • Per-thread sandboxes to safely plan and execute work
  • Governance built-in—scoped access, auditability, and budget controls

One agent for your entire SDLC. Right inside Slack.

👉 Get started


Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@tests/integration/defs/verl/test_verl_cases.py`:
- Around line 139-142: The wrapper _run_single currently uses "-k" which matches
substrings; change it to invoke pytest with the exact node id instead: build a
node id by joining the test file and case name (e.g. "{verl_file}::{case_name}"
or "{verl_file}::ClassName::test_name" when needed) and pass that as a
positional arg via _run_verl_test's extra_args instead of ["-k", case_name], so
pytest runs exactly that single test; update _run_single to construct and pass
the full node id to _run_verl_test.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 910098c5-8e32-40a3-b5e2-611f7480c33b

📥 Commits

Reviewing files that changed from the base of the PR and between 7bc328f and e0907d5.

📒 Files selected for processing (4)
  • tests/integration/defs/verl/test_verl_cases.py
  • tests/integration/defs/verl/verl_config.yml
  • tests/integration/test_lists/test-db/l0_verl.yml
  • tests/integration/test_lists/waives.txt
💤 Files with no reviewable changes (1)
  • tests/integration/test_lists/waives.txt

Comment on lines +139 to 142
def _run_single(verl_file, case_name, timeout=600):
"""Run exactly one verl pytest case by name."""
_run_verl_test(verl_file, extra_args=["-k", case_name], timeout=timeout)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Use exact pytest node IDs instead of -k for single-case wrappers.

Line 141 uses -k, which can select multiple tests via substring/expression matching. That breaks the “exactly one case” guarantee and can silently reduce waive precision as upstream adds similarly named tests.

Suggested patch (exact-case execution)
 def _run_verl_test(test_path, extra_args=None, timeout=600):
     """Run a test from the verl repo via subprocess."""
-    full_path = os.path.join(VERL_ROOT, test_path)
-    assert os.path.exists(full_path), f"Verl test not found: {full_path}"
-    cmd = [sys.executable, "-m", "pytest", full_path, "-v", "--tb=short"]
+    test_file = test_path.split("::", 1)[0]
+    full_test_file = os.path.join(VERL_ROOT, test_file)
+    assert os.path.exists(full_test_file), f"Verl test not found: {full_test_file}"
+    node_id = os.path.join(VERL_ROOT, test_path)
+    cmd = [sys.executable, "-m", "pytest", node_id, "-v", "--tb=short"]
     if extra_args:
         cmd.extend(extra_args)
@@
 def _run_single(verl_file, case_name, timeout=600):
     """Run exactly one verl pytest case by name."""
-    _run_verl_test(verl_file, extra_args=["-k", case_name], timeout=timeout)
+    _run_verl_test(f"{verl_file}::{case_name}", timeout=timeout)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/integration/defs/verl/test_verl_cases.py` around lines 139 - 142, The
wrapper _run_single currently uses "-k" which matches substrings; change it to
invoke pytest with the exact node id instead: build a node id by joining the
test file and case name (e.g. "{verl_file}::{case_name}" or
"{verl_file}::ClassName::test_name" when needed) and pass that as a positional
arg via _run_verl_test's extra_args instead of ["-k", case_name], so pytest runs
exactly that single test; update _run_single to construct and pass the full node
id to _run_verl_test.

@Superjomn
Copy link
Copy Markdown
Collaborator Author

/bot --help

@github-actions
Copy link
Copy Markdown

GitHub Bot Help

/bot [-h] ['run', 'kill', 'skip', 'reuse-pipeline'] ...

Provide a user friendly way for developers to interact with a Jenkins server.

Run /bot [-h|--help] to print this help message.

See details below for each supported subcommand.

Details

run [--reuse-test (optional)pipeline-id --disable-fail-fast --skip-test --stage-list "A10-PyTorch-1, xxx" --gpu-type "A30, H100_PCIe" --test-backend "pytorch, cpp" --add-multi-gpu-test --only-multi-gpu-test --disable-multi-gpu-test --post-merge --extra-stage "H100_PCIe-TensorRT-Post-Merge-1, xxx" --detailed-log --debug(experimental) --high-priority]

Launch build/test pipelines. All previously running jobs will be killed.

--reuse-test (optional)pipeline-id (OPTIONAL) : Allow the new pipeline to reuse build artifacts and skip successful test stages from a specified pipeline or the last pipeline if no pipeline-id is indicated. If the Git commit ID has changed, this option will be always ignored. The DEFAULT behavior of the bot is to reuse build artifacts and successful test results from the last pipeline.

--disable-reuse-test (OPTIONAL) : Explicitly prevent the pipeline from reusing build artifacts and skipping successful test stages from a previous pipeline. Ensure that all builds and tests are run regardless of previous successes.

--disable-fail-fast (OPTIONAL) : Disable fail fast on build/tests/infra failures.

--skip-test (OPTIONAL) : Skip all test stages, but still run build stages, package stages and sanity check stages. Note: Does NOT update GitHub check status.

--stage-list "A10-PyTorch-1, xxx" (OPTIONAL) : Only run the specified test stages. Supports wildcard * for pattern matching (e.g., "*PerfSanity*" matches all stages containing PerfSanity). Examples: "A10-PyTorch-1, xxx", "PerfSanity". Note: Does NOT update GitHub check status.

--gpu-type "A30, H100_PCIe" (OPTIONAL) : Only run the test stages on the specified GPU types. Examples: "A30, H100_PCIe". Note: Does NOT update GitHub check status.

--test-backend "pytorch, cpp" (OPTIONAL) : Skip test stages which don't match the specified backends. Only support [pytorch, cpp, tensorrt, triton]. Examples: "pytorch, cpp" (does not run test stages with tensorrt or triton backend). Note: Does NOT update GitHub pipeline status.

--only-multi-gpu-test (OPTIONAL) : Only run the multi-GPU tests. Note: Does NOT update GitHub check status.

--disable-multi-gpu-test (OPTIONAL) : Disable the multi-GPU tests. Note: Does NOT update GitHub check status.

--add-multi-gpu-test (OPTIONAL) : Force run the multi-GPU tests in addition to running L0 pre-merge pipeline.

--post-merge (OPTIONAL) : Run the L0 post-merge pipeline instead of the ordinary L0 pre-merge pipeline.

--extra-stage "H100_PCIe-TensorRT-Post-Merge-1, xxx" (OPTIONAL) : Run the ordinary L0 pre-merge pipeline and specified test stages. Supports wildcard * for pattern matching. Examples: --extra-stage "H100_PCIe-TensorRT-Post-Merge-1, xxx", --extra-stage "Post-Merge".

--detailed-log (OPTIONAL) : Enable flushing out all logs to the Jenkins console. This will significantly increase the log volume and may slow down the job.

--debug (OPTIONAL) : Experimental feature. Enable access to the CI container for debugging purpose. Note: Specify exactly one stage in the stage-list parameter to access the appropriate container environment. Note: Does NOT update GitHub check status.

--high-priority (OPTIONAL) : Run the pipeline with high priority. This option is restricted to authorized users only and will route the job to a high-priority queue.

kill

kill

Kill all running builds associated with pull request.

skip

skip --comment COMMENT

Skip testing for latest commit on pull request. --comment "Reason for skipping build/test" is required. IMPORTANT NOTE: This is dangerous since lack of user care and validation can cause top of tree to break.

reuse-pipeline

reuse-pipeline

Reuse a previous pipeline to validate current commit. This action will also kill all currently running builds associated with the pull request. IMPORTANT NOTE: This is dangerous since lack of user care and validation can cause top of tree to break.

@Superjomn
Copy link
Copy Markdown
Collaborator Author

/bot run --extra-stage "DGX_B200-4_GPUs-AutoDeploy-1"

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #47921 [ run ] triggered by Bot. Commit: e0907d5 Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #47921 [ run ] completed with state SUCCESS. Commit: e0907d5
/LLM/main/L0_MergeRequest_PR pipeline #37768 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

CI Agent Failure Analysis

Link to invocation

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants