feat(eval): support NeMo-Gym multi-turn rollouts by taivu1998 · Pull Request #2453 · NVIDIA-NeMo/RL

taivu1998 · 2026-05-10T10:53:23Z

Problem

Standalone eval currently supports only the single-turn environment step path. NeMo-Gym already owns the multi-turn rollout loop used by training, but examples/run_eval.py cannot route eval datasets through that Gym-backed rollout path.

Closes #1089.

Root Cause

The eval driver always loads eval datasets, creates a single scoring environment, generates one assistant response, and calls env.step(...). NeMo-Gym data and environments require a different path:

NemoGymDataset and nemo_gym_data_processor preserve Gym row metadata in extra_env_info
rl_collate_fn preserves the metadata and training-style fields expected by the Gym rollout helper
vLLM must expose OpenAI-compatible HTTP server URLs for Gym to call the policy
run_async_nemo_gym_rollout owns the multi-turn rollout, reward extraction, and result postprocessing

Changes

Add eval.rollout_mode with single_turn and nemo_gym modes.
Add eval config validation for Gym requirements:
- max_rollout_turns: null
- num_tests_per_prompt: 1
- async vLLM engine with HTTP server exposure
- top_k: null
- no stop strings or stop token IDs
Route rollout_mode=nemo_gym through NemoGymDataset, rl_collate_fn, NemoGym, and run_async_nemo_gym_rollout.
Add mean_reward scoring for Gym eval while keeping pass@k available when Gym rewards are binary.
Preserve existing single-turn eval behavior and explicitly reject ignored multi-turn limits in single-turn mode.
Save structured JSON eval outputs instead of stringifying message logs and env metadata.
Add examples/configs/evals/nemo_gym_eval.yaml and update existing eval exemplars with the new required config keys.
Extend eval unit tests for rollout-mode validation, collator selection, mean-reward scoring, and Gym result saving.

Validation

tests/unit/evals/test_eval.py: 16 passed
Focused eval YAML schema validation over all examples/configs/evals/*.yaml: passed
uvx ruff check examples/run_eval.py nemo_rl/evals/eval.py nemo_rl/environments/nemo_gym.py nemo_rl/experience/rollouts.py nemo_rl/data/__init__.py tests/unit/evals/test_eval.py: passed
uvx ruff format --check examples/run_eval.py nemo_rl/evals/eval.py nemo_rl/environments/nemo_gym.py nemo_rl/experience/rollouts.py nemo_rl/data/__init__.py tests/unit/evals/test_eval.py: passed
python -m py_compile on changed Python files: passed
git diff --check: passed

Note: the local repo-native uv run path is blocked on this macOS host because /usr/local/bin/python3.13 reports an empty platform.mac_ver() to uv. The focused pytest run was executed in a temporary Python 3.13 uv environment with the repo import dependencies and a temporary decord import stub outside the repository, because decord has no usable macOS arm64 CPython 3.13 wheel here and this test file does not exercise video decoding.

Signed-off-by: taivu1998 <46636857+taivu1998@users.noreply.github.com>

copy-pr-bot · 2026-05-10T10:53:28Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

feat(eval): support NeMo-Gym rollouts

adfd2bd

Signed-off-by: taivu1998 <46636857+taivu1998@users.noreply.github.com>

github-actions Bot added the community-request label May 10, 2026

taivu1998 marked this pull request as ready for review May 11, 2026 03:07

taivu1998 requested review from a team as code owners May 11, 2026 03:07

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(eval): support NeMo-Gym multi-turn rollouts#2453

feat(eval): support NeMo-Gym multi-turn rollouts#2453
taivu1998 wants to merge 1 commit into
NVIDIA-NeMo:mainfrom
taivu1998:tdv/issue-1089-nemo-gym-eval

taivu1998 commented May 10, 2026

Uh oh!

copy-pr-bot Bot commented May 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

taivu1998 commented May 10, 2026

Problem

Root Cause

Changes

Validation

Uh oh!

copy-pr-bot Bot commented May 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants