Refactor RL disagg producer flow with deeper manager-side abstraction by jayhenry · Pull Request #1769 · InternLM/xtuner

jayhenry · 2026-05-08T15:04:24Z

Motivation

This PR simplifies the RL disaggregated producer/consumer flow by moving repeated operational details into the right local abstractions. The goal is to make AgentLoopManager focus on orchestration, while progress accounting, replay buffer operations, strategy context, and async pending-task
concurrency are handled by dedicated components.

Changes

ProduceProgress
- Centralizes producer/consumer progress accounting.
- Tracks absolute target/consumed samples and future step state.
- Avoids scattering target accumulation and consumed-sample updates across manager code.
ReplayBuffer
- Absorbs common buffer operations such as batch readiness, batch take, status counting, and staleness refresh.
- Reduces repeated replay-buffer query/update logic in AgentLoopManager.
ProduceContext
- Becomes the single runtime context passed into ProduceStrategy.
- Encapsulates task-level access to sampler, replay buffer, progress, abort signal, model step, and generated-group put logic.
- Removes legacy scattered strategy arguments and compatibility wrappers.
_PendingTasks
- Encapsulates async pending rollout task claim/wait/cancel/schedule behavior.
- Keeps pending-task concurrency details inside AsyncProduceStrategy.
- Allows AgentLoopManager to use ProduceStrategy.pending_task_count() instead of reading strategy private state.
AgentLoopManager
- Simplifies _produce_batch_to_buffer by making callers provide task batch sizes explicitly.
- Unifies single-task and multi-task production flow.
- Splits batch retrieval logging/result assembly into smaller helpers.
- Flattens producer-loop status waiting for paused and expired states.

Why

These changes reduce hidden invariants and duplicated control flow. AgentLoopManager no longer needs to know how progress is accumulated, how replay buffer state is queried, how generated groups are normalized before put, or how async pending tasks are safely claimed. This makes the
disaggregated rollout path easier to reason about and safer to extend.

jayhenry added 11 commits May 7, 2026 12:34

Fix deterministic disagg trainer test sample

c5bf695

[Fix] Reuse Ray connection in memory monitor

1a0af40

add redesign disagg doc

9e85cf3

Refactor produce progress state handling

72a20fb

Add replay buffer batch helpers

497a95d

Introduce produce context

b0b1ba0

Add agent loop manager shutdown

07c3ad0

Clean up agent loop manager helpers

5c72631

Simplify produce strategy context entrypoints

b73e7b7

Simplify pending task management

9dbd6a6

Simplify agent loop manager flow

8c9ab52

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor RL disagg producer flow with deeper manager-side abstraction#1769

Refactor RL disagg producer flow with deeper manager-side abstraction#1769
jayhenry wants to merge 11 commits intoInternLM:rl_designfrom
jayhenry:rl_redesign

jayhenry commented May 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

jayhenry commented May 8, 2026

Motivation

Changes

Why

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant