feat: Add SGLang rollout backend and tests #1674

RolaoDenthu · 2025-12-21T02:58:50Z

What does this PR do ?

Add comprehensive test coverage for SGLang generation backend, including functional tests, unit tests, and nightly tests.

Functional Test (tests/functional/grpo_sglang.sh): Quick validation of SGLang-based GRPO training
Unit Tests (tests/unit/models/generation/test_sglang_generation.py): unit tests covering:
- Basic configuration validation
- Policy generation and tensor parallelism
- Worker seed behavior for RLHF diversity
- HTTP server direct API access
- Weight updates with DTensor policy (colocated mode)
- Prefix cache reset after weight updates

Convergence curves to demonstrate correctness
https://api.wandb.ai/links/xinyis10-university-of-illinois-urbana-champaign/vyrw4zl1

Usage

You can potentially add a usage example below

# Run functional test
uv add coverage
bash tests/functional/grpo_sglang.sh

# Run unit tests
uv sync --extra sglang --group test
uv run python -m pytest tests/unit/models/generation/test_sglang_generation.py -v --sglang-only

Before your PR is "Ready for review"

Pre checks:

Make sure you read and followed Contributor guidelines
Did you write any new necessary tests?
Did you run the unit tests and functional tests locally? Visit our Testing Guide for how to run tests
Did you add or update any necessary documentation? Visit our Document Development Guide for how to write, build and test the docs.

Summary by CodeRabbit

Release Notes

New Features
- Distributed generation engine using SGLang backend with HTTP weight streaming and multi-GPU support.
Configuration
- New YAML configuration templates for SGLang-based experiments with customizable generation parameters.
Tests
- Comprehensive test coverage for SGLang generation, including tensor parallelism, batching, and dynamic weight updates.

_{✏️ Tip: You can customize this high-level summary in your review settings.}

Signed-off-by: Ryan <yzr1914001753@gmail.com> Signed-off-by: Zhuoran Yin <yzr1914001753@gmail.com>

…a server Signed-off-by: Ryan <yzr1914001753@gmail.com> Signed-off-by: Zhuoran Yin <yzr1914001753@gmail.com>

…p servers Signed-off-by: Ryan <yzr1914001753@gmail.com> Signed-off-by: Zhuoran Yin <yzr1914001753@gmail.com>

Signed-off-by: Ryan <yzr1914001753@gmail.com> Signed-off-by: Zhuoran Yin <yzr1914001753@gmail.com>

sglang: add 1B example Signed-off-by: Ryan <yzr1914001753@gmail.com> Signed-off-by: Zhuoran Yin <yzr1914001753@gmail.com>

Signed-off-by: Ryan <yzr1914001753@gmail.com> Signed-off-by: Zhuoran Yin <yzr1914001753@gmail.com>

Signed-off-by: Zhuoran Yin <yzr1914001753@gmail.com>

- Convert SGLangConfig from regular class to TypedDict inheriting GenerationConfig - Align structure with VllmConfig pattern for consistency - Mark all fields as NotRequired for backward compatibility - Add sglang_kwargs field for additional ServerArgs parameters - Add type casting in grpo.py for type safety This maintains backward compatibility while aligning with the existing generation config structure pattern. Signed-off-by: Zhuoran Yin <yzr1914001753@gmail.com>

Signed-off-by: Zhuoran Yin <yzr1914001753@gmail.com>

Co-authored-by: Terry Kong <terrycurtiskong@gmail.com> Signed-off-by: Night <32424487+PrinsYin@users.noreply.github.com>

guyueh1 · 2026-01-20T03:37:55Z

⚠️ File Consistency Check

Check based on commit: d037f71 (PR #1674 from add-tests)

⚠️ DTensor Policy Worker Synchronization Warning

The file nemo_rl/models/policy/workers/dtensor_policy_worker_v2.py was modified in this PR, but nemo_rl/models/policy/workers/dtensor_policy_worker.py was not updated.
Why this matters: These files contain related DTensor policy worker implementations that should be kept synchronized to ensure consistency across different versions.
Action required:

Please review if the changes in nemo_rl/models/policy/workers/dtensor_policy_worker_v2.py should also be applied to nemo_rl/models/policy/workers/dtensor_policy_worker.py

Update nemo_rl/models/policy/workers/dtensor_policy_worker.py if necessary to maintain consistency

If the files are intentionally different, please add a comment in the PR explaining why

Files to check:

Modified: nemo_rl/models/policy/workers/dtensor_policy_worker_v2.py

Not modified: nemo_rl/models/policy/workers/dtensor_policy_worker.py

This check ensures that related file implementations remain synchronized across the codebase. If you believe this warning is incorrect or the files should intentionally differ, please add a comment explaining the reasoning.

SGLang supports the weight update function only for DTensor v2, while the original DTensor worker does not. Therefore, this change is intentionally applied only to dtensor_policy_worker_v2.py.

I think this is fine to ignore, the API is defined in the base worker as "not implemented", so there is no risk that calling this method with dtensor (v1) object will cause a crash, but it will be caught by a not implemented error.

terrykong

final set of review. generally lgtm!

last remaining things before merging:

resolve the parallel configuration thread that @guyueh1 had brought up
add vllm/sglang convergence curves for model to demonstrate correctness (+lp error, perf, rewards metrics)

nemo_rl/algorithms/utils.py

nemo_rl/models/generation/sglang/__init__.py

pyproject.toml

tests/unit/L0_Unit_Tests_Policy.sh

Signed-off-by: RolaoDenthu <xinyis10@illinois.edu>

RolaoDenthu · 2026-01-21T20:34:19Z

final set of review. generally lgtm!

last remaining things before merging:

resolve the parallel configuration thread that @guyueh1 had brought up

add vllm/sglang convergence curves for model to demonstrate correctness (+lp error, perf, rewards metrics)

Hi @terrykong I have made the requested updates and attached the link for the convergence curve for your consideration. https://api.wandb.ai/links/xinyis10-university-of-illinois-urbana-champaign/vyrw4zl1

guyueh1 · 2026-01-21T21:11:38Z

This is the result of the same grpo experiment (Qwen2.5 1.5B), vllm vs sglang as inference backend

terrykong · 2026-01-22T00:19:55Z

docs CI should resolve after #1806 merged

github-actions · 2026-01-22T04:09:22Z

⚠️ File Consistency Check

Check based on commit: 0ff10fc (PR #1674 from add-tests)

⚠️ DTensor Policy Worker Synchronization Warning

The file nemo_rl/models/policy/workers/dtensor_policy_worker_v2.py was modified in this PR, but nemo_rl/models/policy/workers/dtensor_policy_worker.py was not updated.

Why this matters:
These files contain related DTensor policy worker implementations that should be kept synchronized to ensure consistency across different versions.

Action required:

Please review if the changes in nemo_rl/models/policy/workers/dtensor_policy_worker_v2.py should also be applied to nemo_rl/models/policy/workers/dtensor_policy_worker.py
Update nemo_rl/models/policy/workers/dtensor_policy_worker.py if necessary to maintain consistency
If the files are intentionally different, please add a comment in the PR explaining why

Files to check:

Modified: nemo_rl/models/policy/workers/dtensor_policy_worker_v2.py
Not modified: nemo_rl/models/policy/workers/dtensor_policy_worker.py

_{This check ensures that related file implementations remain synchronized across the codebase. If you believe this warning is incorrect or the files should intentionally differ, please add a comment explaining the reasoning.}

PrinsYin and others added 30 commits December 6, 2025 21:12

sglang support:initial commit

d9cf489

Signed-off-by: Ryan <yzr1914001753@gmail.com> Signed-off-by: Zhuoran Yin <yzr1914001753@gmail.com>

sglang:manually set cuda visible to let localran=0 to manage gpus of …

3eace5f

…a server Signed-off-by: Ryan <yzr1914001753@gmail.com> Signed-off-by: Zhuoran Yin <yzr1914001753@gmail.com>

sglang: add sglang setup in grpo.py, add find available port to set u…

6fbbbb7

…p servers Signed-off-by: Ryan <yzr1914001753@gmail.com> Signed-off-by: Zhuoran Yin <yzr1914001753@gmail.com>

sglang: add shutdown

242612c

Signed-off-by: Ryan <yzr1914001753@gmail.com> Signed-off-by: Zhuoran Yin <yzr1914001753@gmail.com>

sglang server: fix gpu allocation when tp =1

a3d8ad6

Signed-off-by: Ryan <yzr1914001753@gmail.com> Signed-off-by: Zhuoran Yin <yzr1914001753@gmail.com>

generate only first request

88971e3

Signed-off-by: Ryan <yzr1914001753@gmail.com> Signed-off-by: Zhuoran Yin <yzr1914001753@gmail.com>

fix : choose the correct gpu using base gpu id

db8b07b

Signed-off-by: Ryan <yzr1914001753@gmail.com> Signed-off-by: Zhuoran Yin <yzr1914001753@gmail.com>

asyncio to roolout all saples

dd0e54f

Signed-off-by: Ryan <yzr1914001753@gmail.com> Signed-off-by: Zhuoran Yin <yzr1914001753@gmail.com>

fix new event loop for rollout

21c54e3

Signed-off-by: Ryan <yzr1914001753@gmail.com> Signed-off-by: Zhuoran Yin <yzr1914001753@gmail.com>

added mem_fraction

5e24fab

Signed-off-by: Ryan <yzr1914001753@gmail.com> Signed-off-by: Zhuoran Yin <yzr1914001753@gmail.com>

modified build_sampling_paras and stop token handling

50189a9

Signed-off-by: Ryan <yzr1914001753@gmail.com> Signed-off-by: Zhuoran Yin <yzr1914001753@gmail.com>

temp: prevent server overlaod with semaphore

ec35b6b

Signed-off-by: Ryan <yzr1914001753@gmail.com> Signed-off-by: Zhuoran Yin <yzr1914001753@gmail.com>

sglang: refactor, move async loop position

f099caa

Signed-off-by: Ryan <yzr1914001753@gmail.com> Signed-off-by: Zhuoran Yin <yzr1914001753@gmail.com>

sglang: fix total length in generate

a03eba8

Signed-off-by: Ryan <yzr1914001753@gmail.com> Signed-off-by: Zhuoran Yin <yzr1914001753@gmail.com>

sglang: env setup

e08cfd6

sglang: add 1B example Signed-off-by: Ryan <yzr1914001753@gmail.com> Signed-off-by: Zhuoran Yin <yzr1914001753@gmail.com>

from tensor:

ccc66f6

Signed-off-by: Ryan <yzr1914001753@gmail.com> Signed-off-by: Zhuoran Yin <yzr1914001753@gmail.com>

sglang refit: fix sglang import

2ce928b

Signed-off-by: Ryan <yzr1914001753@gmail.com> Signed-off-by: Zhuoran Yin <yzr1914001753@gmail.com>

fix: match fsdp ranks correctly with sglang

4aa1e74

Signed-off-by: Ryan <yzr1914001753@gmail.com> Signed-off-by: Zhuoran Yin <yzr1914001753@gmail.com>

flush cache before update begins

9098077

Signed-off-by: Ryan <yzr1914001753@gmail.com> Signed-off-by: Zhuoran Yin <yzr1914001753@gmail.com>

Fix SGLang compatibility: add hasattr checks for vLLM-specific methods

9900a33

Signed-off-by: Zhuoran Yin <yzr1914001753@gmail.com>

sglang: modified config (increase mem_fration, enable wandb)

5cb78e3

Signed-off-by: Zhuoran Yin <yzr1914001753@gmail.com>

refactor(grpo): extract init logic for generation backends

03d9d0c

Signed-off-by: Zhuoran Yin <yzr1914001753@gmail.com>

refactor: generalize logger metrics for all generation backends

f1c26dd

Signed-off-by: Zhuoran Yin <yzr1914001753@gmail.com>

refactor sglang config loading to make it consistent with other backendw

255dcc6

Signed-off-by: Zhuoran Yin <yzr1914001753@gmail.com>

resolved ai comments

ee01f91

Signed-off-by: Zhuoran Yin <yzr1914001753@gmail.com>

changed print to using loging

e25e573

Signed-off-by: Zhuoran Yin <yzr1914001753@gmail.com>

Merge branch 'main' into sglang_server

e93699f

Update nemo_rl/models/generation/sglang/sglang_worker.py

85d6a92

Co-authored-by: Terry Kong <terrycurtiskong@gmail.com> Signed-off-by: Night <32424487+PrinsYin@users.noreply.github.com>

Merge branch 'main' into sglang_server

be1ae27

guyueh1 previously approved these changes Jan 20, 2026

View reviewed changes

chtruong814 removed the needs-follow-up Issue needs follow-up label Jan 20, 2026

terrykong reviewed Jan 21, 2026

View reviewed changes

RolaoDenthu and others added 2 commits January 21, 2026 00:15

Merge branch 'main' into add-tests

fb524ac

fix recipes, remove unnecessary changes

e735c3b

Signed-off-by: RolaoDenthu <xinyis10@illinois.edu>

RolaoDenthu dismissed guyueh1’s stale review via e735c3b January 21, 2026 09:03

guyueh1 added CI:L2 Run doctests, unit tests, functional tests, and convergence tests and removed CI:L2 Run doctests, unit tests, functional tests, and convergence tests labels Jan 21, 2026

RolaoDenthu and others added 2 commits January 21, 2026 13:28

Merge branch 'main' into add-tests

959c25c

fix recipe

2a40eb9

Signed-off-by: RolaoDenthu <xinyis10@illinois.edu>

guyueh1 added CI:L2 Run doctests, unit tests, functional tests, and convergence tests and removed CI:L2 Run doctests, unit tests, functional tests, and convergence tests labels Jan 21, 2026

guyueh1 temporarily deployed to nemo-ci January 21, 2026 21:23 — with GitHub Actions Inactive

guyueh1 temporarily deployed to nemo-ci January 22, 2026 00:12 — with GitHub Actions Inactive

Merge branch 'main' into add-tests

81838ac

guyueh1 added CI:L2 Run doctests, unit tests, functional tests, and convergence tests and removed CI:L2 Run doctests, unit tests, functional tests, and convergence tests labels Jan 22, 2026

Merge branch 'main' into add-tests

0ff10fc

guyueh1 added CI:L2 Run doctests, unit tests, functional tests, and convergence tests and removed CI:L2 Run doctests, unit tests, functional tests, and convergence tests labels Jan 22, 2026

guyueh1 temporarily deployed to nemo-ci January 22, 2026 04:10 — with GitHub Actions Inactive

guyueh1 temporarily deployed to nemo-ci January 22, 2026 05:56 — with GitHub Actions Inactive

guyueh1 requested a deployment to nemo-ci January 22, 2026 09:32 — with GitHub Actions Queued

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: Add SGLang rollout backend and tests #1674

feat: Add SGLang rollout backend and tests #1674

Uh oh!

RolaoDenthu commented Dec 21, 2025 •

edited

Loading

Uh oh!

guyueh1 commented Jan 20, 2026

⚠️ File Consistency Check

⚠️ DTensor Policy Worker Synchronization Warning

Uh oh!

terrykong left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

RolaoDenthu commented Jan 21, 2026

Uh oh!

guyueh1 commented Jan 21, 2026

Uh oh!

terrykong commented Jan 22, 2026

Uh oh!

github-actions bot commented Jan 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

feat: Add SGLang rollout backend and tests #1674

Are you sure you want to change the base?

feat: Add SGLang rollout backend and tests #1674

Uh oh!

Conversation

RolaoDenthu commented Dec 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do ?

Usage

Before your PR is "Ready for review"

Summary by CodeRabbit

Release Notes

Uh oh!

guyueh1 commented Jan 20, 2026

⚠️ File Consistency Check

⚠️ DTensor Policy Worker Synchronization Warning

Uh oh!

terrykong left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

RolaoDenthu commented Jan 21, 2026

Uh oh!

guyueh1 commented Jan 21, 2026

Uh oh!

terrykong commented Jan 22, 2026

Uh oh!

github-actions bot commented Jan 22, 2026

⚠️ File Consistency Check

⚠️ DTensor Policy Worker Synchronization Warning

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

RolaoDenthu commented Dec 21, 2025 •

edited

Loading