Skip to content

refactor: refactor async utils#2448

Open
yuki-97 wants to merge 1 commit into
mainfrom
yukih/refactor-async-utils
Open

refactor: refactor async utils#2448
yuki-97 wants to merge 1 commit into
mainfrom
yukih/refactor-async-utils

Conversation

@yuki-97
Copy link
Copy Markdown
Contributor

@yuki-97 yuki-97 commented May 9, 2026

Part of RL-727.

  • Split ReplayBuffer into ReplayBufferImpl (non-@ray.remote, inheritable) and ReplayBuffer (@ray.remote, unchanged behavior) to allow subclassing without the @ray.remote inheritance restriction.
  • Adds a stub ReplayBufferNew as the landing zone for the staleness-window work in the follow-up PR feat: support staleness-window in ReplayBufferNew #2458.

No functional changes to ReplayBuffer.

Signed-off-by: Yuki Huang <yukih@nvidia.com>
@copy-pr-bot
Copy link
Copy Markdown

copy-pr-bot Bot commented May 9, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@yuki-97 yuki-97 added the CI:Lfast Runs a fast test suite and re-use nightly `main` container (but sync dependencies to PRs version) label May 9, 2026
@yuki-97
Copy link
Copy Markdown
Contributor Author

yuki-97 commented May 9, 2026

/ok to test 17cd7e2

@yuki-97 yuki-97 marked this pull request as ready for review May 11, 2026 05:06
@yuki-97 yuki-97 requested review from a team as code owners May 11, 2026 05:06
@yuki-97 yuki-97 requested review from mehraakash and terrykong May 11, 2026 05:06
Copy link
Copy Markdown

@mehraakash mehraakash left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, looks good, added a few nits.

from typing import Any, Optional


class ReplayBufferProtocol(ABC):
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💯
But could we make it typing.Protocol instead? here
Something like:

class ReplayBufferProtocol(Protocol):
      def add(...) -> str: ...
      def sample(...) -> Optional[dict[str, Any]]: ...
      def evict(self) -> None: ...
      def size(self) -> int: ...
      def clear(self) -> None: ...



@ray.remote # pragma: no cover
class ReplayBuffer(ReplayBufferImpl):
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we update the async GRPO docs to point to the new module split? It still describes AsyncTrajectoryCollector and ReplayBuffer as living in async_utils.py, but after this PR they live in trajectory_collector.py and replay_buffer.py.

Current State

Comment thread pyrefly.toml
"nemo_rl/algorithms/advantage_estimator.py",
"nemo_rl/algorithms/async_utils/__init__.py",
"nemo_rl/algorithms/async_utils/interfaces.py",
"nemo_rl/algorithms/async_utils/replay_buffer.py",
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we also include:
"nemo_rl/algorithms/async_utils/trajectory_collector.py",

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CI:Lfast Runs a fast test suite and re-use nightly `main` container (but sync dependencies to PRs version)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants