feat(grpo): add SAPO actor loss by taivu1998 · Pull Request #2455 · NVIDIA-NeMo/RL

taivu1998 · 2026-05-10T10:53:38Z

Summary

Add Soft Adaptive Policy Optimization (SAPO) as a selectable GRPO actor loss via loss_fn.actor_loss_type: "sapo".
Keep the existing PPO/GRPO/DAPO/GSPO/REINFORCE behavior under the default actor_loss_type: "ppo_clip".
Add SAPO defaults to GRPO exemplar, NeMo-Gym, ModelOpt, and template configs.
Document the SAPO config surface in the GRPO guide.
Add focused unit coverage for SAPO forward values, metrics, gradients, importance-sampling correction, extreme log-ratio stability, and incompatible config validation.

Closes #1677.

Motivation

Issue #1677 requests support for the SAPO algorithm from https://arxiv.org/pdf/2511.20347. The current GRPO loss path supports PPO-style clipped objectives and related variants, but does not expose SAPO smooth adaptive actor surrogate behavior.

Implementation

Extend ClippedPGLossConfig with actor_loss_type, sapo_tau_pos, sapo_tau_neg, and sapo_log_ratio_clamp_value.
Implement the SAPO token-level surrogate 4 / tau * sigmoid(tau * (r - 1)), with tau selected by advantage sign.
Add an optional SAPO-only log-ratio clamp before exponentiation, plus finite ratio handling for numerical guardrails.
Preserve the existing KL penalty, rollout/logprob data flow, train/inference importance-sampling correction, metrics, and non-SAPO actor-loss behavior.
Reject unsupported SAPO combinations that would silently change the objective semantics, including sequence-level importance ratios, sequence-level loss, disable_ppo_ratio, force_on_policy_ratio, and dual clipping.
Log SAPO activation during GRPO setup.

Validation

Focused SAPO unit suite: 12 passed, 43 deselected for tests/unit/algorithms/test_loss_functions.py -k sapo using a no-project harness with managed Python, because this local machine has a broken /usr/local/bin/python3.13 and the project requires >=3.13.13.
ruff check on all touched Python files.
ruff format --check on all touched Python files.
python -m py_compile on all touched Python files.
git diff --check.
YAML inheritance scan over all GRPO configs in examples/ and research/template_project that mention ratio_clip_c; all resolve the new SAPO config keys.

Notes

The default remains actor_loss_type: "ppo_clip", so existing configs preserve their current behavior unless SAPO is explicitly enabled.

Signed-off-by: taivu1998 <46636857+taivu1998@users.noreply.github.com>

copy-pr-bot · 2026-05-10T10:53:42Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

feat(grpo): add SAPO actor loss

93ae67b

Signed-off-by: taivu1998 <46636857+taivu1998@users.noreply.github.com>

github-actions Bot added Documentation Improvements or additions to documentation community-request labels May 10, 2026

taivu1998 marked this pull request as ready for review May 11, 2026 03:06

taivu1998 requested review from a team and terrykong as code owners May 11, 2026 03:06

svcnvidia-nemo-ci added the waiting-on-maintainers Waiting on maintainers to respond label May 12, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(grpo): add SAPO actor loss#2455

feat(grpo): add SAPO actor loss#2455
taivu1998 wants to merge 1 commit into
NVIDIA-NeMo:mainfrom
taivu1998:tdv/issue-1677-sapo

taivu1998 commented May 10, 2026

Uh oh!

copy-pr-bot Bot commented May 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

taivu1998 commented May 10, 2026

Summary

Motivation

Implementation

Validation

Notes

Uh oh!

copy-pr-bot Bot commented May 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants