feat(grpo): log per-optimizer step metrics by taivu1998 · Pull Request #2452 · NVIDIA-NeMo/RL

taivu1998 · 2026-05-10T10:53:23Z

Summary

Add explicit optimizer-step counters to GRPO logging so each PPO optimization step is emitted under train/optim/* with train/optim_step.
Preserve RL-step visibility by adding train/rl_step, train/optim_step, and train/num_optim_steps_per_rl_step to aggregate train logs.
Carry per-optimizer-step metrics through DTensor v1, DTensor v2, and Megatron workers, then aggregate them across policy workers by optimizer-step index.
Persist total_optim_steps in GRPO checkpoints with fallback inference for older checkpoints.

Root Cause

GRPO policy training can run multiple optimizer steps inside one RL step, but metrics were only aggregated and logged once per RL step. That hid per-step behavior and made off-policy PPO diagnostics hard to read.

Validation

uvx ruff check nemo_rl/algorithms/grpo.py nemo_rl/models/policy/lm_policy.py nemo_rl/models/policy/utils.py nemo_rl/models/policy/workers/dtensor_policy_worker.py nemo_rl/models/policy/workers/dtensor_policy_worker_v2.py nemo_rl/models/policy/workers/megatron_policy_worker.py nemo_rl/utils/logger.py tests/unit/algorithms/test_grpo.py tests/unit/models/policy/test_utils.py tests/unit/utils/test_logger.py
uvx ruff format --check nemo_rl/algorithms/grpo.py nemo_rl/models/policy/lm_policy.py nemo_rl/models/policy/utils.py nemo_rl/models/policy/workers/dtensor_policy_worker.py nemo_rl/models/policy/workers/dtensor_policy_worker_v2.py nemo_rl/models/policy/workers/megatron_policy_worker.py nemo_rl/utils/logger.py tests/unit/algorithms/test_grpo.py tests/unit/models/policy/test_utils.py tests/unit/utils/test_logger.py
/usr/local/bin/python3.10 source compile for all touched files
Direct source-extracted GRPO helper checks
/Users/vuductai/Documents/Projects/RL/.venv-dev/bin/python -m pytest tests/unit/models/policy/test_utils.py tests/unit/utils/test_logger.py -q -k "optim_step_metrics or aggregate_metric_dicts or unscale_loss_metrics or step_metric or gpu_monitoring" (13 passed, 69 deselected)

Local Environment Notes

uv run pytest ... is blocked locally by a broken /usr/local/bin/python3.13 install.
Focused tests/unit/algorithms/test_grpo.py collection is blocked in the local venv by missing optional soundfile after adding Megatron submodule paths.

Closes #1435.

Signed-off-by: taivu1998 <46636857+taivu1998@users.noreply.github.com>

copy-pr-bot · 2026-05-10T10:53:27Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

feat(grpo): log optimizer step metrics

b61d847

Signed-off-by: taivu1998 <46636857+taivu1998@users.noreply.github.com>

github-actions Bot added the community-request label May 10, 2026

taivu1998 marked this pull request as ready for review May 11, 2026 03:07

taivu1998 requested review from a team as code owners May 11, 2026 03:07

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(grpo): log per-optimizer step metrics#2452

feat(grpo): log per-optimizer step metrics#2452
taivu1998 wants to merge 1 commit into
NVIDIA-NeMo:mainfrom
taivu1998:tdv/issue-1435-step-logging

taivu1998 commented May 10, 2026

Uh oh!

copy-pr-bot Bot commented May 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

taivu1998 commented May 10, 2026

Summary

Root Cause

Validation

Local Environment Notes

Uh oh!

copy-pr-bot Bot commented May 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants