Skip to content

Conversation

@YanhuiDua
Copy link
Collaborator

@YanhuiDua YanhuiDua commented Dec 15, 2025

This PR introduces asynchronous RL support to Xtuner, enabling partial rollouts and version-based sample management for more efficient training data generation.

1. Key Concepts:

  • staleness_threshold: The maximum allowed threshold of stale (expired) samples in a training batch.
  • enable_partial_rollout: Whether to enable partial rollout for asynchronous data generation.
  • tail_batch_candidate_steps: Number of rollout steps after which a sample becomes a candidate for the tail batch. Set to 0 to disable. 0 means no tail batch.
  • tail_batch_trigger_size: Number of candidate samples needed in the queue to trigger a tail batch operation. It will be set to global_batch_size when not provided by user or set to 0

2. Async logic:

Strategy Type Settings Core Features
Synchronous Strategy staleness_threshold=0.0
enable_partial_rollout=0
tail_batch_candidate_steps=0
1. No data oversending
Asynchronous 1 staleness_threshold=0.2
enable_partial_rollout=0
tail_batch_candidate_steps=0
1. 20% data oversending
2. Responses not retained when paused rollout
3. Prioritize sampling data from the abort queue
Asynchronous 2 staleness_threshold=0.2
enable_partial_rollout=0
tail_batch_candidate_steps=1
tail_batch_trigger_size=0
1. 20% data oversending
2. Responses not retained when paused
3. Prioritize sampling data from the abort queue
4. Put it into the candidate pool when sample abort num reaches tail_batch_candidate_steps+1
Asynchronous 3 staleness_threshold=0.2
enable_partial_rollout=1
tail_batch_candidate_steps=0
tail_batch_trigger_size=0
1. 20% data oversending
2. Responses retained & concatenated when paused
3. Prioritize sampling data from the abort queue
Asynchronous 4 staleness_threshold=0.2
enable_partial_rollout=1
tail_batch_candidate_steps=1
tail_batch_trigger_size=0
1. 20% data oversending
2. Responses retained & concatenated when paused
3. Prioritize sampling data from the abort queue
4. Put it into the candidate pool when sample abort num reaches tail_batch_candidate_steps+1. the tail_batch_candidate_steps means off policy step

3. BenchMark

4. Relative PR

  • Added async-related configuration parameters including partial_rollout, tail_batch_candidate_steps, tail_batch_trigger_size and staleness_threshold;
  • Refactored replay buffer storage to support versioned samples with bucketed tracking of completed, aborted, and expired states
  • Renamed Sampler to DatasetSampler and separated dataset sampling logic from replay buffer sampling
  • Apply sample_from_expired_storage in dataflow. When sample_from_expired_storage is set to True, the dataflow will not oversend data and will return data only after all tasks of the current batch are completed.
  • Add task time log info.
  • Added partial rollout functionality with versioned response tracking to accumulate tokens across multiple generation steps
  • Implemented automatic worker restart mechanism when all rollout workers become inactive
  • Fixed state handling for aborted rollouts and improved error logging
  • Add tensorboard for training and rollout metrics.
  • Refactored the training loop in fit() to conditionally execute rollout, training, and weight synchronization based on debug mode
  • Fix async running bugs

@YanhuiDua YanhuiDua force-pushed the support_async_rl_4 branch 2 times, most recently from 5e3f135 to aaa4860 Compare December 19, 2025 04:20
tail_batch_trigger_size: Annotated[
Optional[int],
Parameter(
help="Number of candidate samples needed in the queue to trigger a tail batch operation. Set to 0 to disable."
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Set to 0 to disable. 这句描述不对

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个没有所谓的 enable说法吧,需要配合 tail_batch_candidate_steps 才生效

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个支持tail_batch_candidate_steps>0, tail_batch_trigger_size=0,这种情况下,过期的数据会放到过期队列,但是不会触发tail_batch,相当于这部分数据不去训练

response_ids: Optional[List[int]] = None
logprobs: Optional[List[float]] = None
num_return_tokens: Optional[int] = None
versioned_response: List[str] = Field(default_factory=list)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

思考下未来多轮情况下,这个地方是否有改动?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

多轮的输出应该是list[RolloutResonseItem], RolloutResonseItem里面的结构不会变吧

tail_batch_trigger_size: Annotated[
Optional[int],
Parameter(
help="Number of candidate samples needed in the queue to trigger a tail batch operation. Set to 0 to disable."
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个没有所谓的 enable说法吧,需要配合 tail_batch_candidate_steps 才生效

if not self.enable_partial_rollout:
# 清除上次的response_ids等env数据
if "routed_experts" in sample.env.rollout.extra_info:
del sample.env.rollout.extra_info["routed_experts"]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

同理

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

是否有考虑在中断情况下,下一次发送请求时候发给同一个 server,从而复用 cache.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

现在也不能复用cache,每一次权重更新都会清掉cache,这个feature可能需要跟cache kv off-policy一起加

@YanhuiDua YanhuiDua force-pushed the support_async_rl_4 branch 2 times, most recently from 50afb64 to 718f817 Compare January 28, 2026 13:27
@YanhuiDua YanhuiDua requested a review from hhaAndroid January 28, 2026 13:32
@YanhuiDua
Copy link
Collaborator Author

YanhuiDua commented Jan 28, 2026

异步的功能通过 test_lmdeploy_dataflow_save_resume_with_partial_rollout / test_lmdeploy_dataflow_save_resume_with_partial_rollout_r3进行测试,精度测试与RL的e2e测试一起加吧

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants