[WIP][PipelineRL] Normalization of new_logprobs and addition of other RL metrics by bigximik · Pull Request #476 · ServiceNow/Fast-LLM

bigximik · 2026-03-21T17:07:53Z

Add generic denominator_batch_field to LossDef so any metric can be normalized by a pre-computed per-micro-batch scalar from the batch context, bypassing TP/SP/PP splitting entirely.

For grpo_new_logprobs: compute num_docs = (labels_per_document > 0).sum() in language_model.py before any parallel splitting, giving a true per-document average regardless of variable document lengths.

Only sequence_data_rank==0 contributes to num_docs. The runner all_reduces the denominator across the data group (which includes SDP ranks); if every SDP rank reported its own num_docs, a single document processed by SDP=2 would be counted twice, halving the metric.

Also clamp num_labels_in_seq to avoid 0/0=nan for padding segments or fully-masked documents (loss_mask=0 there so the numerator is 0 too).

Tests verify:

num_docs counts only unmasked documents
padding segments (pad_to_size) are excluded
with SDP=2, only rank 0 contributes num_docs so the all_reduce SUM across SDP ranks gives the correct denominator

✨ Description

Please provide a brief summary of the changes, relevant motivation, and context.
Include any related issue numbers or links to discussions, and explain why this change is necessary.

Closes #

🔍 Type of change

Select all that apply:

🐛 Bug fix (non-breaking change that addresses a specific issue)
🚀 New feature (non-breaking change that adds functionality)
⚠️ Breaking change (a change that could affect existing functionality)
📈 Performance improvement/optimization (improves speed, memory usage, or efficiency)
🛠️ Code refactor (non-functional changes that improve code readability, structure, etc.)
📦 Dependency bump (updates dependencies, including Dockerfile or package changes)
📝 Documentation change (updates documentation, including new content or typo fixes)
🔧 Infrastructure/Build change (affects build process, CI/CD, or dependencies)

📝 Changes

List the key changes introduced in this PR:

Change A
Change B

✅ Checklist

Make sure the following tasks are completed before submitting the PR:

General

📜 I have read and followed the contributing guidelines.
🏷️ I am using a clear and descriptive PR title that summarizes the key change or feature introduced.
🎉 The functionality is complete, and I have tested the changes.
📝 I have updated the documentation if needed.
⚠️ The change does not introduce any new issues (e.g., runtime warnings, type checker errors, linting problems, unhandled edge cases).
🧩 I have commented my code, especially in hard-to-understand areas.

Dependencies and Configuration

🐋 I have updated the Docker configuration or dependencies, if applicable.
🔄 I have ensured compatibility with the existing setup after dependency changes.

Testing

🧪 I have added or updated tests to cover my changes.
✔️ New and existing tests pass locally with my changes.
🚦 I have tested these changes on GPUs and verified training stability.
🏋️ I have tested the changes on realistic training workloads, if applicable.

Performance Impact

📊 I have run benchmarks where applicable to evaluate the performance impact.
✅ The benchmarks show no performance regression.
🚀 The benchmarks indicate a potential performance improvement.
⚠️ The benchmarks indicate a potential performance degradation.
📈 I have provided benchmark results and detailed any performance impact below, if applicable.

📊 Performance Impact Details

If there is any impact on performance, describe it and provide benchmark results, if applicable:

🗒️ Additional Notes

Include any additional context, information, or considerations here, such as known issues, follow-up tasks, or backward compatibility concerns.

Add generic denominator_batch_field to LossDef so any metric can be normalized by a pre-computed per-micro-batch scalar from the batch context, bypassing TP/SP/PP splitting entirely. For grpo_new_logprobs: compute num_docs = (labels_per_document > 0).sum() in language_model.py before any parallel splitting, giving a true per-document average regardless of variable document lengths. Only sequence_data_rank==0 contributes to num_docs. The runner all_reduces the denominator across the data group (which includes SDP ranks); if every SDP rank reported its own num_docs, a single document processed by SDP=2 would be counted twice, halving the metric. Also clamp num_labels_in_seq to avoid 0/0=nan for padding segments or fully-masked documents (loss_mask=0 there so the numerator is 0 too). Tests verify: - num_docs counts only unmasked documents - padding segments (pad_to_size) are excluded - with SDP=2, only rank 0 contributes num_docs so the all_reduce SUM across SDP ranks gives the correct denominator

When micro_batch_splits > 1, _get_model_input is called once per split on the same rank. Documents that span a split boundary appear in both splits' cropped_lengths, so both would count them without a guard. The runner sums num_docs across all splits in context.batch, so boundary documents would be counted multiple times. Fix: after the loop in get_model_inputs, set num_docs=None on all splits except the first. The first split already holds the correct count (guarded by sequence_data_rank==0 for SDP); subsequent splits get None which the runner treats as 0 via `batch_kwargs[field] or 0`. With micro_batch_splits=1 (the default) model_inputs[1:] is empty so there is no behaviour change.

bigximik added 2 commits March 21, 2026 17:04

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP][PipelineRL] Normalization of new_logprobs and addition of other RL metrics#476

[WIP][PipelineRL] Normalization of new_logprobs and addition of other RL metrics#476
bigximik wants to merge 2 commits intomainfrom
rl_metrics

bigximik commented Mar 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

bigximik commented Mar 21, 2026

✨ Description

🔍 Type of change

📝 Changes

✅ Checklist

General

Dependencies and Configuration

Testing

Performance Impact

📊 Performance Impact Details

🗒️ Additional Notes

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant