Skip to content

support replay buffer save and resume, save_hf in trainer#1592

Merged
YanhuiDua merged 4 commits intoInternLM:rl_designfrom
YanhuiDua:resume_replaybuffer
Mar 20, 2026
Merged

support replay buffer save and resume, save_hf in trainer#1592
YanhuiDua merged 4 commits intoInternLM:rl_designfrom
YanhuiDua:resume_replaybuffer

Conversation

@YanhuiDua
Copy link
Collaborator

No description provided.

@YanhuiDua
Copy link
Collaborator Author

@claude review

@claude
Copy link

claude bot commented Mar 18, 2026

Claude: ## Summary

Adds save/resume checkpointing to ReplayBuffer via pickle, with state_dict/load_state_dict on both NaiveStorage and PandasStorage. Tests are well-structured and cover FIFO + staleness round-trips.

Issues

Warning

  • [replay_buffer.py:387] pickle.load is an arbitrary code execution vector — document trust boundary or consider safer serialization.
  • [replay_buffer.py:381] pickle.dump/pickle.load are synchronous blocking calls that will stall the asyncio event loop for large buffers. Wrap in asyncio.to_thread.
  • [replay_buffer.py:370] New public methods (save, resume, state_dict, load_state_dict) are missing Google-style docstrings per project standards.

Nit

  • [replay_buffer.py:395] resume silently discards existing buffer data — consider a guard or documentation.
  • [test_replay_buffer.py:131] Missing trailing newline at end of file.

Verdict

COMMENT — no critical bugs found. The core logic is correct and tests are solid. Please address the blocking I/O and docstring issues before merge.

@YanhuiDua YanhuiDua force-pushed the resume_replaybuffer branch 2 times, most recently from 5c933c6 to c918dba Compare March 18, 2026 12:51
@YanhuiDua
Copy link
Collaborator Author

@claude review

@@ -331,6 +376,38 @@ async def count(self, task_name: str, group_status: Status) -> int:
def __len__(self) -> int:
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claude: Nit — Missing docstrings on public methods

Per project standards (CLAUDE.md), public methods must have Google-style docstrings. save, resume, state_dict, and load_state_dict are all public and undocumented.

@claude
Copy link

claude bot commented Mar 18, 2026

Claude: ## Summary

Adds save/resume checkpointing for ReplayBuffer and HF model saving to the RL trainer. Tests cover FIFO and staleness round-trips well.

Issues

Critical

  • [rl_colocate_trainer.py:257] Tuple unpacking precedence bug — is_hf_path receives the full (bool, Exception) tuple (always truthy), so HF path validation is completely bypassed and error_info is always "".

Warning

  • [replay_buffer.py:395] torch.load without weights_only=False will fail on PyTorch >= 2.6 since the state contains non-tensor objects (StorageItem, pd.DataFrame).
  • [rl_colocate_trainer.py:384] latest_checkpoint resolution changed from all-exps to last-exp-only — behavioral change not mentioned in PR description.
  • [rl_colocate_trainer.py:437] assert used for runtime validation — gets stripped with python -O.

Nit

  • [replay_buffer.py:376] New public methods (save, resume, state_dict, load_state_dict) missing Google-style docstrings per project standards.
  • [test_replay_buffer.py] Missing trailing newline at end of file.

Verdict

REQUEST_CHANGES — the tuple unpacking bug silently disables HF path validation and must be fixed before merge.

@YanhuiDua YanhuiDua force-pushed the resume_replaybuffer branch from c918dba to c18b4f2 Compare March 18, 2026 13:04
@YanhuiDua YanhuiDua changed the title support replay buffer save and resume support replay buffer save and resume, save_hf in trainer Mar 19, 2026
@YanhuiDua
Copy link
Collaborator Author

@claude review

@YanhuiDua YanhuiDua force-pushed the resume_replaybuffer branch from 6776063 to ba549d7 Compare March 19, 2026 06:47
@YanhuiDua YanhuiDua force-pushed the resume_replaybuffer branch from ba549d7 to 24c8789 Compare March 19, 2026 06:58
@YanhuiDua YanhuiDua merged commit 65f1d77 into InternLM:rl_design Mar 20, 2026
2 of 6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants