[Pipeline RL] Add support for PipelineRL by jlamypoirier · Pull Request #428 · ServiceNow/Fast-LLM

jlamypoirier · 2025-12-17T18:10:41Z

This PR provides the initial integration with PipelineRL with GRPO loss.

It introduces:

Streaming Redis Dataset — capable of consuming documents such as rollouts from an external Redis stream.
Trainer Callback System — supports callbacks for events like training_started, step_finished, and training_finished.
Redis-based Callback Implementation with Weights Broadcast Mechanism — uses a separate external NCCL rendezvous point to broadcast updated model weights in real time to inference servers, and a Redis stream to broadcast training events.

This enables seamless coordination between Fast-LLM training and PipelineRL-based inference or orchestration components.

…M into denis/new_datasets

…enis/new_datasets

…g to grpo

…line_rl

… forward Move num_labels_in_seq computation from _compute_num_labels_in_seq (called inside forward_backward on the already-packed sequence) to _get_model_input, where document boundaries are available via cropped_lengths. Per-document response token counts are trivially computed and broadcast to token positions, eliminating the need for span-finding on the packed sequence. Also fixes new_logprobs metric scaling with cross_entropy_splits > 1, and updates test_lm_head to properly handle list-indexed advantages/old_log_probs and verify the new_logprobs extra metric. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

- Use local_world_size=1 (not world_size) since each process sees exactly one GPU via CUDA_VISIBLE_DEVICES in PipelineRL's setup - Switch from torch.distributed.broadcast_object_list/broadcast to fast_llm.core.distributed.broadcast_object/broadcast, which work directly on ProcessGroupNCCL backend objects (ProcessGroupPool returns unregistered backends that torch.distributed ops cannot accept) - Use process_group.shutdown() instead of torch.distributed.destroy_process_group for the same reason Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Two bugs when micro-batches are padded to pad_to_size in LanguageModelBatch.from_documents: 1. advantages and old_log_probabilities (TokenDataBatch) were not padded to match the token batch size. get_cropped_data(label_begin, label_end) then returned fewer elements than logits, causing a shape mismatch in fused_grpo_loss_forward_backward. 2. num_labels_in_seq used cropped_lengths from (begin, label_end) which spans end-begin+prediction_distance tokens, one more than the model input length. Now uses (label_begin, label_end) so segment lengths sum to end-begin, matching new_log_probs shape. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

fast_llm/layers/language_model/loss/grpo.py

start_time was set once at the start of iterate() and never reset, causing TimeoutError after 600s of total training time regardless of whether documents were actively flowing. Reset start_time on each successful XREADGROUP response so the timeout only fires when no new documents have arrived for the configured duration. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

searchsorted requires a sorted haystack, but labels_per_document is an unsorted array of per-document label counts. Using it directly caused incorrect doc-index lookups, resulting in wrong (often zero) label counts and nan in the grpo_new_logprobs metric. Fix: use length_cumsum[1:] (sorted) to map each token to its document index, then index labels_per_document with that result.

Padded tokens and fully masked documents have num_labels_in_seq=0 and loss_mask=0. Without clamping, 0/0=nan poisons the sum even though those positions contribute nothing to the loss. Clamp to min=1 so masked positions produce 0/1=0 instead.

bigximik

I’ve made several additional changes and addressed @rafapi feedback. @jlamypoirier , could you review and confirm? Otherwise, we can merge.

Changes done

jlamypoirier and others added 30 commits October 14, 2025 22:52

Dataset interface

1a18929

misc

fd63846

fix

2486caf

Language model sample

92e93e8

fix

d6f6944

fixes

5c802fa

test

95d1840

fixes

eafd9cb

cleanup

c56df69

misc

7f437e1

misc

dfd27f5

Memmap dataset

90cd009

fixes

acfd30e

fixes

34939e9

int64

c5fa072

Test and fix preparator

cd28676

fix

435d214

fix

f6bef55

fix

e05d9a1

fix

9ba8d1b

fixes

b35b297

misc

abe2357

fix

1801d87

fix right stage mode

2223b85

newer transformers fixes

a9a4ace

fix distributed tests skip on single gpu

97f2b60

set mamba 2 style model conversions to broke

0fdc978

Merge branch 'jlp/dataset_interface' of github.com:ServiceNow/Fast-LL…

665deb5

…M into denis/new_datasets

Merge branch 'jlp/lm_sample' of github.com:ServiceNow/Fast-LLM into d…

4d03889

…enis/new_datasets

mmaba2 enable conversion tests

224c2ec

jlamypoirier and others added 4 commits January 29, 2026 16:42

Merge branch 'jlp_entropy_loss_tweaks' into jlp_pipeline_rl

902d1df

switched streaming callbacks to init_extra_process_group

aff414a

remved message print

6fe1959

added extra metric reporting to loss as well as new_logprobs reportin…

2ce2bd9

…g to grpo

Base automatically changed from jlp_entropy_loss_tweaks to main March 17, 2026 23:42

jlamypoirier and others added 15 commits March 17, 2026 20:23

Merge commit '84533da9d890bddd92adabfd02ac18a4890a2136' into jlp_pipe…

531553e

…line_rl

Merge commit '2b96efd7daf3d2fac03d18440b1fd97812aaca5d' into jlp_pipe…

ec19c61

…line_rl

Merge commit 'de9aebdf6029a8c12ee30a186b341cc0b71a215d' into jlp_pipe…

5d1d838

…line_rl

merge fixes

b35a3a4

Merge commit 'a1cdc55ff01c80295f0e5a118d5010b64ae97a7d' into jlp_pipe…

bf77d47

…line_rl

Merge commit '6ba8ca8421ba9d1f4f2bdb0f3ee56de8bf443df6' into jlp_pipe…

9764935

…line_rl

fixes

39efda4

Merge remote-tracking branch 'origin/main' into jlp_pipeline_rl

511f0de

Fix merge

85b505e

Fix merge

e94c9ff

tweaks

63448c7

fix

6abc7ac

rafapi previously requested changes Mar 20, 2026

View reviewed changes

fast_llm/layers/language_model/loss/grpo.py Outdated Show resolved Hide resolved

bigximik and others added 3 commits March 20, 2026 08:46

bigximik approved these changes Mar 20, 2026

View reviewed changes

tweaks

907cb2a

jlamypoirier marked this pull request as ready for review March 20, 2026 21:17

jlamypoirier merged commit 24b0f0c into main Mar 20, 2026
1 of 2 checks passed

jlamypoirier deleted the jlp_pipeline_rl branch March 20, 2026 21:18

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Pipeline RL] Add support for PipelineRL#428

[Pipeline RL] Add support for PipelineRL#428
jlamypoirier merged 134 commits intomainfrom
jlp_pipeline_rl

jlamypoirier commented Dec 17, 2025 •

edited by bigximik

Loading

Uh oh!

Uh oh!

bigximik left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

jlamypoirier commented Dec 17, 2025 • edited by bigximik Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

bigximik left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

jlamypoirier commented Dec 17, 2025 •

edited by bigximik

Loading