Skip to content

feat(grpo): support async multiple dataloaders#2454

Open
taivu1998 wants to merge 1 commit into
NVIDIA-NeMo:mainfrom
taivu1998:tdv/issue-2006-async-multiple-dataloaders
Open

feat(grpo): support async multiple dataloaders#2454
taivu1998 wants to merge 1 commit into
NVIDIA-NeMo:mainfrom
taivu1998:tdv/issue-2006-async-multiple-dataloaders

Conversation

@taivu1998
Copy link
Copy Markdown

Summary

Fixes #2006.

This PR enables standard async GRPO to run with data.use_multiple_dataloader=true while keeping async NeMo-Gym GRPO explicitly unsupported for that path.

Root Cause

The standard async entrypoint blocked multiple dataloaders outright, and the async collector consumed the next prompt batch before reserving the target weight version. That ordering prevented custom multiple-dataloader functions from seeing the async generation target they were sampling for. The wrapper also eagerly held live iterators and Hydra-loaded callables, which made it a poor object to ship into a Ray actor or checkpoint from async collection.

Changes

  • Make MultipleDataloaderWrapper lazy, picklable, and checkpointable with per-task dataloader state.
  • Reserve async target weight versions before consuming dataloader batches.
  • Pass async records into custom multiple-dataloader functions: generation weight version, target weight version, current collector weight version, and expected prompt count.
  • Save async multiple-dataloader checkpoint state as train_dataloader_<task>.pt, matching synchronous GRPO resume behavior.
  • Remove the standard async GRPO multiple-dataloader guard from examples/run_grpo.py.
  • Keep the async NeMo-Gym guard, but make its error message specific to async NeMo-Gym GRPO.
  • Add a target-aware custom dataloader example plus unit coverage for lazy/picklable wrapper behavior, state restore, and async record-driven sampling.
  • Document async multiple-dataloader configuration, records, checkpointing, and the NeMo-Gym limitation.

Validation

  • uvx --python /Users/vuductai/.local/share/uv/python/cpython-3.12-macos-aarch64-none/bin/python3.12 ruff check nemo_rl/data/dataloader.py nemo_rl/algorithms/async_utils.py nemo_rl/algorithms/grpo.py examples/run_grpo.py examples/nemo_gym/run_grpo_nemo_gym.py examples/custom_dataloader/custom_dataloader.py tests/unit/data/test_multiple_dataloader.py
  • /Users/vuductai/.local/share/uv/python/cpython-3.12-macos-aarch64-none/bin/python3.12 -m py_compile nemo_rl/data/dataloader.py nemo_rl/algorithms/async_utils.py nemo_rl/algorithms/grpo.py examples/run_grpo.py examples/nemo_gym/run_grpo_nemo_gym.py examples/custom_dataloader/custom_dataloader.py tests/unit/data/test_multiple_dataloader.py tests/unit/algorithms/test_async_utils.py
  • git diff --check

uv run pytest tests/unit/data/test_multiple_dataloader.py tests/unit/algorithms/test_async_utils.py is currently blocked on this macOS host before test collection because /usr/local/bin/python3.13 is broken: platform.mac_ver() returns an empty value. The repository requires Python >=3.13.13, and uv python install 3.13.13 reports no matching macOS aarch64 download.

Signed-off-by: taivu1998 <46636857+taivu1998@users.noreply.github.com>
@copy-pr-bot
Copy link
Copy Markdown

copy-pr-bot Bot commented May 10, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@github-actions github-actions Bot added Documentation Improvements or additions to documentation community-request labels May 10, 2026
@taivu1998 taivu1998 marked this pull request as ready for review May 11, 2026 03:06
@taivu1998 taivu1998 requested review from a team as code owners May 11, 2026 03:06
@svcnvidia-nemo-ci svcnvidia-nemo-ci added the waiting-on-maintainers Waiting on maintainers to respond label May 12, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

community-request Documentation Improvements or additions to documentation waiting-on-maintainers Waiting on maintainers to respond

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Support multiple dataloader for async grpo

2 participants