[Feature] support async rl #1360

YanhuiDua · 2025-12-15T06:52:49Z

This PR introduces asynchronous RL support to Xtuner, enabling partial rollouts and version-based sample management for more efficient training data generation.

1. Key Concepts:

staleness_threshold: The maximum allowed threshold of stale (expired) samples in a training batch.
enable_partial_rollout: Whether to enable partial rollout for asynchronous data generation.
tail_batch_candidate_steps: Number of rollout steps after which a sample becomes a candidate for the tail batch. Set to 0 to disable. 0 means no tail batch.
tail_batch_trigger_size: Number of candidate samples needed in the queue to trigger a tail batch operation. It will be set to global_batch_size when not provided by user or set to 0

2. Async logic:

Strategy Type	Settings	Core Features
Synchronous Strategy	`staleness_threshold=0.0` `enable_partial_rollout=0` `tail_batch_candidate_steps=0`	1. No data oversending
Asynchronous 1	`staleness_threshold=0.2` `enable_partial_rollout=0` `tail_batch_candidate_steps=0`	1. 20% data oversending 2. Responses not retained when paused rollout 3. Prioritize sampling data from the abort queue
Asynchronous 2	`staleness_threshold=0.2` `enable_partial_rollout=0` `tail_batch_candidate_steps=1` `tail_batch_trigger_size=0`	1. 20% data oversending 2. Responses not retained when paused 3. Prioritize sampling data from the abort queue 4. Put it into the candidate pool when sample abort num reaches `tail_batch_candidate_steps+1`
Asynchronous 3	`staleness_threshold=0.2` `enable_partial_rollout=1` `tail_batch_candidate_steps=0` `tail_batch_trigger_size=0`	1. 20% data oversending 2. Responses retained & concatenated when paused 3. Prioritize sampling data from the abort queue
Asynchronous 4	`staleness_threshold=0.2` `enable_partial_rollout=1` `tail_batch_candidate_steps=1` `tail_batch_trigger_size=0`	1. 20% data oversending 2. Responses retained & concatenated when paused 3. Prioritize sampling data from the abort queue 4. Put it into the candidate pool when sample abort num reaches `tail_batch_candidate_steps+1`. the `tail_batch_candidate_steps` means off policy step

3. BenchMark

4. Relative PR

PR: [Feat][1/N] support async_rl in replaybuffer #1337

Added async-related configuration parameters including partial_rollout, tail_batch_candidate_steps, tail_batch_trigger_size and staleness_threshold；
Refactored replay buffer storage to support versioned samples with bucketed tracking of completed, aborted, and expired states
Renamed Sampler to DatasetSampler and separated dataset sampling logic from replay buffer sampling

PR2: [Feat][2/N] support async_rl in dataflow YanhuiDua/xtuner#2

Apply sample_from_expired_storage in dataflow. When sample_from_expired_storage is set to True, the dataflow will not oversend data and will return data only after all tasks of the current batch are completed.
Add task time log info.

PR3: [Feat][3/N] support async_rl in rollout YanhuiDua/xtuner#3

Added partial rollout functionality with versioned response tracking to accumulate tokens across multiple generation steps
Implemented automatic worker restart mechanism when all rollout workers become inactive
Fixed state handling for aborted rollouts and improved error logging

PR4: [Feat][4/4] support async_rl in rl_trainer YanhuiDua/xtuner#4

Add tensorboard for training and rollout metrics.
Refactored the training loop in fit() to conditionally execute rollout, training, and weight synchronization based on debug mode
Fix async running bugs

…nd storage

…orage

xtuner/v1/ray/dataflow/flow.py

xtuner/v1/train/rl_trainer.py

xtuner/v1/ray/dataflow/replay_buffer.py

xtuner/v1/ray/dataflow/flow.py

xtuner/v1/data_proto/rl_data.py

hhaAndroid · 2025-12-24T10:22:00Z

xtuner/v1/ray/dataflow/flow.py

+    tail_batch_trigger_size: Annotated[
+        Optional[int],
+        Parameter(
+            help="Number of candidate samples needed in the queue to trigger a tail batch operation. Set to 0 to disable."


Set to 0 to disable. 这句描述不对

这个没有所谓的 enable说法吧，需要配合 tail_batch_candidate_steps 才生效

这个支持tail_batch_candidate_steps>0, tail_batch_trigger_size=0，这种情况下，过期的数据会放到过期队列，但是不会触发tail_batch，相当于这部分数据不去训练

hhaAndroid · 2026-01-06T06:13:59Z

xtuner/v1/data_proto/rl_data.py

    response_ids: Optional[List[int]] = None
+    logprobs: Optional[List[float]] = None
    num_return_tokens: Optional[int] = None
+    versioned_response: List[str] = Field(default_factory=list)


思考下未来多轮情况下，这个地方是否有改动？

多轮的输出应该是list[RolloutResonseItem], RolloutResonseItem里面的结构不会变吧

hhaAndroid · 2026-01-06T06:23:03Z

xtuner/v1/ray/dataflow/flow.py

+    tail_batch_trigger_size: Annotated[
+        Optional[int],
+        Parameter(
+            help="Number of candidate samples needed in the queue to trigger a tail batch operation. Set to 0 to disable."


这个没有所谓的 enable说法吧，需要配合 tail_batch_candidate_steps 才生效

xtuner/v1/ray/dataflow/replay_buffer.py

xtuner/v1/ray/rollout/worker.py

hhaAndroid · 2026-01-06T08:09:16Z

xtuner/v1/ray/dataflow/replay_buffer.py

+            if not self.enable_partial_rollout:
+                # 清除上次的response_ids等env数据
+                if "routed_experts" in sample.env.rollout.extra_info:
+                    del sample.env.rollout.extra_info["routed_experts"]


是否有考虑在中断情况下，下一次发送请求时候发给同一个 server，从而复用 cache.

现在也不能复用cache，每一次权重更新都会清掉cache，这个feature可能需要跟cache kv off-policy一起加

YanhuiDua · 2026-01-28T13:34:09Z

异步的功能通过 test_lmdeploy_dataflow_save_resume_with_partial_rollout / test_lmdeploy_dataflow_save_resume_with_partial_rollout_r3进行测试，精度测试与RL的e2e测试一起加吧

YanhuiDua added 6 commits December 8, 2025 15:30

[Feat][1/N] support async_rl in replaybuffer by refactoring sampler a…

6cff996

…nd storage

[Feat][1/N] support async_rl in replaybuffer by supporting expired st…

4c6d2fc

…orage

[Feat][2/N] support async_rl in dataflow

0b634e4

[Feat][3/N] support async_rl in rollout

d87b4b1

[Feat][4/4] support async_rl in rl_trainer

ce74b1f

[Feat][5/5] add tensorboard metrics

7b4d41a

YanhuiDua force-pushed the support_async_rl_4 branch from efb3109 to 1601d51 Compare December 16, 2025 09:40

[Feat][6/N] fix some bugs and add logs

aaa4860

YanhuiDua force-pushed the support_async_rl_4 branch 2 times, most recently from 5e3f135 to aaa4860 Compare December 19, 2025 04:20

jayhenry reviewed Dec 23, 2025

View reviewed changes

xtuner/v1/ray/dataflow/flow.py Outdated Show resolved Hide resolved

xtuner/v1/train/rl_trainer.py Show resolved Hide resolved

xtuner/v1/ray/dataflow/replay_buffer.py Outdated Show resolved Hide resolved

xtuner/v1/ray/dataflow/flow.py Outdated Show resolved Hide resolved

[Fix] fix concating routed_experts in r3 with partial rollout

953a613

YanhuiDua force-pushed the support_async_rl_4 branch from 31b3535 to 953a613 Compare December 23, 2025 09:38

YanhuiDua added 2 commits December 23, 2025 20:41

[fix] add token-level entropy

f1deb88

tmp-commit: fix r3 bug

4bd4c4f

YanhuiDua force-pushed the support_async_rl_4 branch from f6fa0fd to 4bd4c4f Compare December 25, 2025 08:48

fix routed_experts in rl_trainer

ba993a3

hhaAndroid reviewed Jan 6, 2026

View reviewed changes

YanhuiDua force-pushed the support_async_rl_4 branch from 2397345 to 003ca72 Compare January 6, 2026 08:43

Merge branch 'main' into HEAD

ab6bccc

YanhuiDua force-pushed the support_async_rl_4 branch from 003ca72 to ab6bccc Compare January 6, 2026 08:59

Merge branch 'main' into HEAD

4e62131

YanhuiDua force-pushed the support_async_rl_4 branch 2 times, most recently from 50afb64 to 718f817 Compare January 28, 2026 13:27

YanhuiDua requested a review from hhaAndroid January 28, 2026 13:32

fix comments

283b1e1

YanhuiDua force-pushed the support_async_rl_4 branch from 718f817 to 283b1e1 Compare January 28, 2026 13:40

updates dataflow.run to return dataFlowresult

e2d18c0

YanhuiDua force-pushed the support_async_rl_4 branch from fc8681a to e2d18c0 Compare January 29, 2026 07:36

[Feature] support async rl #1360

Are you sure you want to change the base?

[Feature] support async rl #1360

Uh oh!

Conversation

YanhuiDua commented Dec 15, 2025 • edited by hhaAndroid Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

YanhuiDua commented Jan 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

YanhuiDua commented Dec 15, 2025 •

edited by hhaAndroid

Loading

YanhuiDua commented Jan 28, 2026 •

edited

Loading