Skip to content

Refactor RL disagg producer flow with deeper manager-side abstraction#1769

Open
jayhenry wants to merge 11 commits intoInternLM:rl_designfrom
jayhenry:rl_redesign
Open

Refactor RL disagg producer flow with deeper manager-side abstraction#1769
jayhenry wants to merge 11 commits intoInternLM:rl_designfrom
jayhenry:rl_redesign

Conversation

@jayhenry
Copy link
Copy Markdown
Collaborator

@jayhenry jayhenry commented May 8, 2026

Motivation

This PR simplifies the RL disaggregated producer/consumer flow by moving repeated operational details into the right local abstractions. The goal is to make AgentLoopManager focus on orchestration, while progress accounting, replay buffer operations, strategy context, and async pending-task
concurrency are handled by dedicated components.

Changes

  • ProduceProgress

    • Centralizes producer/consumer progress accounting.
    • Tracks absolute target/consumed samples and future step state.
    • Avoids scattering target accumulation and consumed-sample updates across manager code.
  • ReplayBuffer

    • Absorbs common buffer operations such as batch readiness, batch take, status counting, and staleness refresh.
    • Reduces repeated replay-buffer query/update logic in AgentLoopManager.
  • ProduceContext

    • Becomes the single runtime context passed into ProduceStrategy.
    • Encapsulates task-level access to sampler, replay buffer, progress, abort signal, model step, and generated-group put logic.
    • Removes legacy scattered strategy arguments and compatibility wrappers.
  • _PendingTasks

    • Encapsulates async pending rollout task claim/wait/cancel/schedule behavior.
    • Keeps pending-task concurrency details inside AsyncProduceStrategy.
    • Allows AgentLoopManager to use ProduceStrategy.pending_task_count() instead of reading strategy private state.
  • AgentLoopManager

    • Simplifies _produce_batch_to_buffer by making callers provide task batch sizes explicitly.
    • Unifies single-task and multi-task production flow.
    • Splits batch retrieval logging/result assembly into smaller helpers.
    • Flattens producer-loop status waiting for paused and expired states.

Why

These changes reduce hidden invariants and duplicated control flow. AgentLoopManager no longer needs to know how progress is accumulated, how replay buffer state is queried, how generated groups are normalized before put, or how async pending tasks are safely claimed. This makes the
disaggregated rollout path easier to reason about and safer to extend.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant