feat: Support lora in dtensor grpo workflow by merging weight #1797

RayenTian · 2026-01-20T09:24:34Z

Summary

Merge LoRA adapter weights into base linear weights when exporting dtensor state and skip standalone LoRA adapter tensors.
Add LoRA configuration defaults to grpo_math_1B.yaml and introduce a Qwen3-8B LoRA recipe.
Expand LoRA coverage in functional and unit tests (vLLM generation + GRPO LoRA suites).

Changes

nemo_rl/models/policy/workers/dtensor_policy_worker_v2.py
- Merge LoRA weights into base weights during state export.
- Skip lora_A/lora_B tensors and release temporary tensors to reduce memory.
examples/configs/grpo_math_1B.yaml
- Add LoRA config section with defaults/documentation.
examples/configs/recipes/llm/grpo-qwen3-8B-base-1n8g-fsdp2-lora.yaml
- New LoRA recipe for Qwen3-8B.
tests/functional/*
- Add GRPO LoRA functional tests (sync/async/non-colocated) and include in nightly.
tests/unit/models/generation/test_vllm_generation.py
- Add LoRA config coverage and parameters in vLLM tests.

Testing

Not run (manual PR only).

Notes

LoRA weight merge uses W + scale * (B @ A) with dtype/device alignment.

Signed-off-by: ruit <ruit@nvidia.com>

github-actions · 2026-01-20T09:25:07Z

⚠️ File Consistency Check

Check based on commit: 1ccb5be (PR #1797 from ruit/lora_merge_weight)

⚠️ DTensor Policy Worker Synchronization Warning

The file nemo_rl/models/policy/workers/dtensor_policy_worker_v2.py was modified in this PR, but nemo_rl/models/policy/workers/dtensor_policy_worker.py was not updated.

Why this matters:
These files contain related DTensor policy worker implementations that should be kept synchronized to ensure consistency across different versions.

Action required:

Please review if the changes in nemo_rl/models/policy/workers/dtensor_policy_worker_v2.py should also be applied to nemo_rl/models/policy/workers/dtensor_policy_worker.py
Update nemo_rl/models/policy/workers/dtensor_policy_worker.py if necessary to maintain consistency
If the files are intentionally different, please add a comment in the PR explaining why

Files to check:

Modified: nemo_rl/models/policy/workers/dtensor_policy_worker_v2.py
Not modified: nemo_rl/models/policy/workers/dtensor_policy_worker.py

_{This check ensures that related file implementations remain synchronized across the codebase. If you believe this warning is incorrect or the files should intentionally differ, please add a comment explaining the reasoning.}

Signed-off-by: ruit <ruit@nvidia.com>

github-actions · 2026-01-21T02:43:24Z

⚠️ File Consistency Check

Check based on commit: acad57c (PR #1797 from ruit/lora_merge_weight)

⚠️ DTensor Policy Worker Synchronization Warning

The file nemo_rl/models/policy/workers/dtensor_policy_worker_v2.py was modified in this PR, but nemo_rl/models/policy/workers/dtensor_policy_worker.py was not updated.

Why this matters:
These files contain related DTensor policy worker implementations that should be kept synchronized to ensure consistency across different versions.

Action required:

Please review if the changes in nemo_rl/models/policy/workers/dtensor_policy_worker_v2.py should also be applied to nemo_rl/models/policy/workers/dtensor_policy_worker.py
Update nemo_rl/models/policy/workers/dtensor_policy_worker.py if necessary to maintain consistency
If the files are intentionally different, please add a comment in the PR explaining why

Files to check:

Modified: nemo_rl/models/policy/workers/dtensor_policy_worker_v2.py
Not modified: nemo_rl/models/policy/workers/dtensor_policy_worker.py

_{This check ensures that related file implementations remain synchronized across the codebase. If you believe this warning is incorrect or the files should intentionally differ, please add a comment explaining the reasoning.}

Signed-off-by: ruit <ruit@nvidia.com>

github-actions · 2026-01-22T07:35:27Z

⚠️ File Consistency Check

Check based on commit: 68263ea (PR #1797 from ruit/lora_merge_weight)

⚠️ DTensor Policy Worker Synchronization Warning

The file nemo_rl/models/policy/workers/dtensor_policy_worker_v2.py was modified in this PR, but nemo_rl/models/policy/workers/dtensor_policy_worker.py was not updated.

Why this matters:
These files contain related DTensor policy worker implementations that should be kept synchronized to ensure consistency across different versions.

Action required:

Please review if the changes in nemo_rl/models/policy/workers/dtensor_policy_worker_v2.py should also be applied to nemo_rl/models/policy/workers/dtensor_policy_worker.py
Update nemo_rl/models/policy/workers/dtensor_policy_worker.py if necessary to maintain consistency
If the files are intentionally different, please add a comment in the PR explaining why

Files to check:

Modified: nemo_rl/models/policy/workers/dtensor_policy_worker_v2.py
Not modified: nemo_rl/models/policy/workers/dtensor_policy_worker.py

_{This check ensures that related file implementations remain synchronized across the codebase. If you believe this warning is incorrect or the files should intentionally differ, please add a comment explaining the reasoning.}

RayenTian · 2026-01-22T07:35:58Z

nemo_rl/models/policy/workers/patches.py

        print(f"Error applying torch.ops.aten.alias.default patch: {e}")
+
+
+def patched_lora_linear_forward(self, x):


this patch cannot guarantee that the computational logic is exactly identical to the original.

RayenTian added 5 commits January 20, 2026 01:22

feat: Add LoRA configuration and integration in dtensor policy worker

9b790d8

Signed-off-by: ruit <ruit@nvidia.com>

add functional test

4a855a1

Signed-off-by: ruit <ruit@nvidia.com>

add nightly test

6e2f80e

Signed-off-by: ruit <ruit@nvidia.com>

add unit test

6ec3d06

Signed-off-by: ruit <ruit@nvidia.com>

del temporary tensor

1ccb5be

Signed-off-by: ruit <ruit@nvidia.com>

RayenTian added the CI:L1 Run doctests, unit tests, and functional tests label Jan 20, 2026

RayenTian had a problem deploying to nemo-ci January 20, 2026 09:27 — with GitHub Actions Error

RayenTian removed the CI:L1 Run doctests, unit tests, and functional tests label Jan 20, 2026

add space line

acad57c

Signed-off-by: ruit <ruit@nvidia.com>

RayenTian added the CI:L1 Run doctests, unit tests, and functional tests label Jan 21, 2026

RayenTian temporarily deployed to nemo-ci January 21, 2026 02:43 — with GitHub Actions Inactive

RayenTian temporarily deployed to nemo-ci January 21, 2026 02:47 — with GitHub Actions Inactive

add lora forward patch

68263ea

Signed-off-by: ruit <ruit@nvidia.com>

RayenTian commented Jan 22, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: Support lora in dtensor grpo workflow by merging weight #1797

feat: Support lora in dtensor grpo workflow by merging weight #1797

Uh oh!

RayenTian commented Jan 20, 2026 •

edited

Loading

Uh oh!

github-actions bot commented Jan 20, 2026

Uh oh!

github-actions bot commented Jan 21, 2026

Uh oh!

github-actions bot commented Jan 22, 2026

Uh oh!

RayenTian Jan 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		print(f"Error applying torch.ops.aten.alias.default patch: {e}")


		def patched_lora_linear_forward(self, x):

feat: Support lora in dtensor grpo workflow by merging weight #1797

Are you sure you want to change the base?

feat: Support lora in dtensor grpo workflow by merging weight #1797

Uh oh!

Conversation

RayenTian commented Jan 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes

Testing

Notes

Uh oh!

github-actions bot commented Jan 20, 2026

⚠️ File Consistency Check

⚠️ DTensor Policy Worker Synchronization Warning

Uh oh!

github-actions bot commented Jan 21, 2026

⚠️ File Consistency Check

⚠️ DTensor Policy Worker Synchronization Warning

Uh oh!

github-actions bot commented Jan 22, 2026

⚠️ File Consistency Check

⚠️ DTensor Policy Worker Synchronization Warning

Uh oh!

RayenTian Jan 22, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

RayenTian commented Jan 20, 2026 •

edited

Loading