Skip to content

[WIP][PipelineRL] Normalization of new_logprobs and addition of other RL metrics#476

Draft
bigximik wants to merge 2 commits intomainfrom
rl_metrics
Draft

[WIP][PipelineRL] Normalization of new_logprobs and addition of other RL metrics#476
bigximik wants to merge 2 commits intomainfrom
rl_metrics

Conversation

@bigximik
Copy link
Collaborator

Add generic denominator_batch_field to LossDef so any metric can be normalized by a pre-computed per-micro-batch scalar from the batch context, bypassing TP/SP/PP splitting entirely.

For grpo_new_logprobs: compute num_docs = (labels_per_document > 0).sum() in language_model.py before any parallel splitting, giving a true per-document average regardless of variable document lengths.

Only sequence_data_rank==0 contributes to num_docs. The runner all_reduces the denominator across the data group (which includes SDP ranks); if every SDP rank reported its own num_docs, a single document processed by SDP=2 would be counted twice, halving the metric.

Also clamp num_labels_in_seq to avoid 0/0=nan for padding segments or fully-masked documents (loss_mask=0 there so the numerator is 0 too).

Tests verify:

  • num_docs counts only unmasked documents
  • padding segments (pad_to_size) are excluded
  • with SDP=2, only rank 0 contributes num_docs so the all_reduce SUM across SDP ranks gives the correct denominator

✨ Description

Please provide a brief summary of the changes, relevant motivation, and context.
Include any related issue numbers or links to discussions, and explain why this change is necessary.

Closes #

🔍 Type of change

Select all that apply:

  • 🐛 Bug fix (non-breaking change that addresses a specific issue)
  • 🚀 New feature (non-breaking change that adds functionality)
  • ⚠️ Breaking change (a change that could affect existing functionality)
  • 📈 Performance improvement/optimization (improves speed, memory usage, or efficiency)
  • 🛠️ Code refactor (non-functional changes that improve code readability, structure, etc.)
  • 📦 Dependency bump (updates dependencies, including Dockerfile or package changes)
  • 📝 Documentation change (updates documentation, including new content or typo fixes)
  • 🔧 Infrastructure/Build change (affects build process, CI/CD, or dependencies)

📝 Changes

List the key changes introduced in this PR:

  1. Change A
  2. Change B

✅ Checklist

Make sure the following tasks are completed before submitting the PR:

General

  • 📜 I have read and followed the contributing guidelines.
  • 🏷️ I am using a clear and descriptive PR title that summarizes the key change or feature introduced.
  • 🎉 The functionality is complete, and I have tested the changes.
  • 📝 I have updated the documentation if needed.
  • ⚠️ The change does not introduce any new issues (e.g., runtime warnings, type checker errors, linting problems, unhandled edge cases).
  • 🧩 I have commented my code, especially in hard-to-understand areas.

Dependencies and Configuration

  • 🐋 I have updated the Docker configuration or dependencies, if applicable.
  • 🔄 I have ensured compatibility with the existing setup after dependency changes.

Testing

  • 🧪 I have added or updated tests to cover my changes.
  • ✔️ New and existing tests pass locally with my changes.
  • 🚦 I have tested these changes on GPUs and verified training stability.
  • 🏋️ I have tested the changes on realistic training workloads, if applicable.

Performance Impact

  • 📊 I have run benchmarks where applicable to evaluate the performance impact.
  • ✅ The benchmarks show no performance regression.
  • 🚀 The benchmarks indicate a potential performance improvement.
  • ⚠️ The benchmarks indicate a potential performance degradation.
  • 📈 I have provided benchmark results and detailed any performance impact below, if applicable.

📊 Performance Impact Details

If there is any impact on performance, describe it and provide benchmark results, if applicable:


🗒️ Additional Notes

Include any additional context, information, or considerations here, such as known issues, follow-up tasks, or backward compatibility concerns.

Add generic denominator_batch_field to LossDef so any metric can be
normalized by a pre-computed per-micro-batch scalar from the batch
context, bypassing TP/SP/PP splitting entirely.

For grpo_new_logprobs: compute num_docs = (labels_per_document > 0).sum()
in language_model.py before any parallel splitting, giving a true
per-document average regardless of variable document lengths.

Only sequence_data_rank==0 contributes to num_docs. The runner
all_reduces the denominator across the data group (which includes SDP
ranks); if every SDP rank reported its own num_docs, a single document
processed by SDP=2 would be counted twice, halving the metric.

Also clamp num_labels_in_seq to avoid 0/0=nan for padding segments or
fully-masked documents (loss_mask=0 there so the numerator is 0 too).

Tests verify:
- num_docs counts only unmasked documents
- padding segments (pad_to_size) are excluded
- with SDP=2, only rank 0 contributes num_docs so the all_reduce SUM
  across SDP ranks gives the correct denominator
When micro_batch_splits > 1, _get_model_input is called once per split on
the same rank.  Documents that span a split boundary appear in both splits'
cropped_lengths, so both would count them without a guard.  The runner sums
num_docs across all splits in context.batch, so boundary documents would be
counted multiple times.

Fix: after the loop in get_model_inputs, set num_docs=None on all splits
except the first.  The first split already holds the correct count (guarded
by sequence_data_rank==0 for SDP); subsequent splits get None which the
runner treats as 0 via `batch_kwargs[field] or 0`.

With micro_batch_splits=1 (the default) model_inputs[1:] is empty so there
is no behaviour change.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant