Skip to content

Conversation

@RolaoDenthu
Copy link

@RolaoDenthu RolaoDenthu commented Dec 21, 2025

What does this PR do ?

Add comprehensive test coverage for SGLang generation backend, including functional tests, unit tests, and nightly tests.

  • Functional Test (tests/functional/grpo_sglang.sh): Quick validation of SGLang-based GRPO training
  • Unit Tests (tests/unit/models/generation/test_sglang_generation.py): unit tests covering:
    • Basic configuration validation
    • Policy generation and tensor parallelism
    • Worker seed behavior for RLHF diversity
    • HTTP server direct API access
    • Weight updates with DTensor policy (colocated mode)
    • Prefix cache reset after weight updates

Convergence curves to demonstrate correctness
https://api.wandb.ai/links/xinyis10-university-of-illinois-urbana-champaign/vyrw4zl1

Usage

  • You can potentially add a usage example below
# Run functional test
uv add coverage
bash tests/functional/grpo_sglang.sh

# Run unit tests
uv sync --extra sglang --group test
uv run python -m pytest tests/unit/models/generation/test_sglang_generation.py -v --sglang-only

Before your PR is "Ready for review"

Pre checks:

  • Make sure you read and followed Contributor guidelines
  • Did you write any new necessary tests?
  • Did you run the unit tests and functional tests locally? Visit our Testing Guide for how to run tests
  • Did you add or update any necessary documentation? Visit our Document Development Guide for how to write, build and test the docs.

Summary by CodeRabbit

Release Notes

  • New Features

    • Distributed generation engine using SGLang backend with HTTP weight streaming and multi-GPU support.
  • Configuration

    • New YAML configuration templates for SGLang-based experiments with customizable generation parameters.
  • Tests

    • Comprehensive test coverage for SGLang generation, including tensor parallelism, batching, and dynamic weight updates.

✏️ Tip: You can customize this high-level summary in your review settings.

PrinsYin and others added 30 commits December 6, 2025 21:12
Signed-off-by: Ryan <yzr1914001753@gmail.com>
Signed-off-by: Zhuoran Yin <yzr1914001753@gmail.com>
…a server

Signed-off-by: Ryan <yzr1914001753@gmail.com>
Signed-off-by: Zhuoran Yin <yzr1914001753@gmail.com>
…p servers

Signed-off-by: Ryan <yzr1914001753@gmail.com>
Signed-off-by: Zhuoran Yin <yzr1914001753@gmail.com>
Signed-off-by: Ryan <yzr1914001753@gmail.com>
Signed-off-by: Zhuoran Yin <yzr1914001753@gmail.com>
Signed-off-by: Ryan <yzr1914001753@gmail.com>
Signed-off-by: Zhuoran Yin <yzr1914001753@gmail.com>
Signed-off-by: Ryan <yzr1914001753@gmail.com>
Signed-off-by: Zhuoran Yin <yzr1914001753@gmail.com>
Signed-off-by: Ryan <yzr1914001753@gmail.com>
Signed-off-by: Zhuoran Yin <yzr1914001753@gmail.com>
Signed-off-by: Ryan <yzr1914001753@gmail.com>
Signed-off-by: Zhuoran Yin <yzr1914001753@gmail.com>
Signed-off-by: Ryan <yzr1914001753@gmail.com>
Signed-off-by: Zhuoran Yin <yzr1914001753@gmail.com>
Signed-off-by: Ryan <yzr1914001753@gmail.com>
Signed-off-by: Zhuoran Yin <yzr1914001753@gmail.com>
Signed-off-by: Ryan <yzr1914001753@gmail.com>
Signed-off-by: Zhuoran Yin <yzr1914001753@gmail.com>
Signed-off-by: Ryan <yzr1914001753@gmail.com>
Signed-off-by: Zhuoran Yin <yzr1914001753@gmail.com>
Signed-off-by: Ryan <yzr1914001753@gmail.com>
Signed-off-by: Zhuoran Yin <yzr1914001753@gmail.com>
Signed-off-by: Ryan <yzr1914001753@gmail.com>
Signed-off-by: Zhuoran Yin <yzr1914001753@gmail.com>
sglang: add 1B example
Signed-off-by: Ryan <yzr1914001753@gmail.com>
Signed-off-by: Zhuoran Yin <yzr1914001753@gmail.com>
Signed-off-by: Ryan <yzr1914001753@gmail.com>
Signed-off-by: Zhuoran Yin <yzr1914001753@gmail.com>
Signed-off-by: Ryan <yzr1914001753@gmail.com>
Signed-off-by: Zhuoran Yin <yzr1914001753@gmail.com>
Signed-off-by: Ryan <yzr1914001753@gmail.com>
Signed-off-by: Zhuoran Yin <yzr1914001753@gmail.com>
Signed-off-by: Ryan <yzr1914001753@gmail.com>
Signed-off-by: Zhuoran Yin <yzr1914001753@gmail.com>
Signed-off-by: Zhuoran Yin <yzr1914001753@gmail.com>
Signed-off-by: Zhuoran Yin <yzr1914001753@gmail.com>
Signed-off-by: Zhuoran Yin <yzr1914001753@gmail.com>
- Convert SGLangConfig from regular class to TypedDict inheriting GenerationConfig
- Align structure with VllmConfig pattern for consistency
- Mark all fields as NotRequired for backward compatibility
- Add sglang_kwargs field for additional ServerArgs parameters
- Add type casting in grpo.py for type safety

This maintains backward compatibility while aligning with the existing
generation config structure pattern.

Signed-off-by: Zhuoran Yin <yzr1914001753@gmail.com>
Signed-off-by: Zhuoran Yin <yzr1914001753@gmail.com>
Signed-off-by: Zhuoran Yin <yzr1914001753@gmail.com>
Signed-off-by: Zhuoran Yin <yzr1914001753@gmail.com>
Signed-off-by: Zhuoran Yin <yzr1914001753@gmail.com>
Co-authored-by: Terry Kong <terrycurtiskong@gmail.com>
Signed-off-by: Night <32424487+PrinsYin@users.noreply.github.com>
@guyueh1
Copy link
Contributor

guyueh1 commented Jan 20, 2026

⚠️ File Consistency Check

Check based on commit: d037f71 (PR #1674 from add-tests)

⚠️ DTensor Policy Worker Synchronization Warning

The file nemo_rl/models/policy/workers/dtensor_policy_worker_v2.py was modified in this PR, but nemo_rl/models/policy/workers/dtensor_policy_worker.py was not updated.
Why this matters: These files contain related DTensor policy worker implementations that should be kept synchronized to ensure consistency across different versions.
Action required:

  • Please review if the changes in nemo_rl/models/policy/workers/dtensor_policy_worker_v2.py should also be applied to nemo_rl/models/policy/workers/dtensor_policy_worker.py
  • Update nemo_rl/models/policy/workers/dtensor_policy_worker.py if necessary to maintain consistency
  • If the files are intentionally different, please add a comment in the PR explaining why

Files to check:

  • Modified: nemo_rl/models/policy/workers/dtensor_policy_worker_v2.py
  • Not modified: nemo_rl/models/policy/workers/dtensor_policy_worker.py

This check ensures that related file implementations remain synchronized across the codebase. If you believe this warning is incorrect or the files should intentionally differ, please add a comment explaining the reasoning.

SGLang supports the weight update function only for DTensor v2, while the original DTensor worker does not. Therefore, this change is intentionally applied only to dtensor_policy_worker_v2.py.

I think this is fine to ignore, the API is defined in the base worker as "not implemented", so there is no risk that calling this method with dtensor (v1) object will cause a crash, but it will be caught by a not implemented error.

guyueh1
guyueh1 previously approved these changes Jan 20, 2026
@chtruong814 chtruong814 removed the needs-follow-up Issue needs follow-up label Jan 20, 2026
Copy link
Contributor

@terrykong terrykong left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

final set of review. generally lgtm!

last remaining things before merging:

  1. resolve the parallel configuration thread that @guyueh1 had brought up
  2. add vllm/sglang convergence curves for model to demonstrate correctness (+lp error, perf, rewards metrics)

RolaoDenthu and others added 2 commits January 21, 2026 00:15
@guyueh1 guyueh1 added CI:L2 Run doctests, unit tests, functional tests, and convergence tests and removed CI:L2 Run doctests, unit tests, functional tests, and convergence tests labels Jan 21, 2026
RolaoDenthu and others added 2 commits January 21, 2026 13:28
Signed-off-by: RolaoDenthu <xinyis10@illinois.edu>
@RolaoDenthu
Copy link
Author

final set of review. generally lgtm!

last remaining things before merging:

  1. resolve the parallel configuration thread that @guyueh1 had brought up
  2. add vllm/sglang convergence curves for model to demonstrate correctness (+lp error, perf, rewards metrics)

Hi @terrykong I have made the requested updates and attached the link for the convergence curve for your consideration. https://api.wandb.ai/links/xinyis10-university-of-illinois-urbana-champaign/vyrw4zl1

@guyueh1
Copy link
Contributor

guyueh1 commented Jan 21, 2026

This is the result of the same grpo experiment (Qwen2.5 1.5B), vllm vs sglang as inference backend
Screenshot 2026-01-21 at 1 10 32 PM

@guyueh1 guyueh1 added CI:L2 Run doctests, unit tests, functional tests, and convergence tests and removed CI:L2 Run doctests, unit tests, functional tests, and convergence tests labels Jan 21, 2026
@terrykong
Copy link
Contributor

docs CI should resolve after #1806 merged

@guyueh1 guyueh1 added CI:L2 Run doctests, unit tests, functional tests, and convergence tests and removed CI:L2 Run doctests, unit tests, functional tests, and convergence tests labels Jan 22, 2026
@guyueh1 guyueh1 added CI:L2 Run doctests, unit tests, functional tests, and convergence tests and removed CI:L2 Run doctests, unit tests, functional tests, and convergence tests labels Jan 22, 2026
@github-actions
Copy link

⚠️ File Consistency Check

Check based on commit: 0ff10fc (PR #1674 from add-tests)

⚠️ DTensor Policy Worker Synchronization Warning

The file nemo_rl/models/policy/workers/dtensor_policy_worker_v2.py was modified in this PR, but nemo_rl/models/policy/workers/dtensor_policy_worker.py was not updated.

Why this matters:
These files contain related DTensor policy worker implementations that should be kept synchronized to ensure consistency across different versions.

Action required:

  • Please review if the changes in nemo_rl/models/policy/workers/dtensor_policy_worker_v2.py should also be applied to nemo_rl/models/policy/workers/dtensor_policy_worker.py
  • Update nemo_rl/models/policy/workers/dtensor_policy_worker.py if necessary to maintain consistency
  • If the files are intentionally different, please add a comment in the PR explaining why

Files to check:

  • Modified: nemo_rl/models/policy/workers/dtensor_policy_worker_v2.py
  • Not modified: nemo_rl/models/policy/workers/dtensor_policy_worker.py

This check ensures that related file implementations remain synchronized across the codebase. If you believe this warning is incorrect or the files should intentionally differ, please add a comment explaining the reasoning.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CI:L2 Run doctests, unit tests, functional tests, and convergence tests community-request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants