Skip to content

[Improve] Free routed experts ray obj ref to avoid memory leak#1595

Merged
YanhuiDua merged 2 commits intoInternLM:mainfrom
RangiLyu:lcq/free-expert-ref
Mar 20, 2026
Merged

[Improve] Free routed experts ray obj ref to avoid memory leak#1595
YanhuiDua merged 2 commits intoInternLM:mainfrom
RangiLyu:lcq/free-expert-ref

Conversation

@RangiLyu
Copy link
Contributor

No description provided.

assert sample.data.input_ids and sample.data.num_tokens, "input_ids or num_tokens is empty!"
if "routed_experts" in sample.env.rollout.extra_info:
ray._private.internal_api.free(sample.env.rollout.extra_info["routed_experts"])
ray.internal.free(sample.env.rollout.extra_info["routed_experts"], local_only=False)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个是修改是因为ray的api有更新吗?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

调用外部库的私有方法不安全,统一改了

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ray文档说在释放显式序列化的object.ref时,需要显式调用ray._private.internal_api.free(obj_ref)来释放 https://docs.ray.io/en/latest/ray-core/objects/serialization.html#serializing-objectrefs

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@YanhuiDua
Copy link
Collaborator

@claude review

Comment on lines +465 to +468
routed_experts = ray.get(data_item.env.rollout.extra_info["routed_experts"])
ray.internal.free(data_item.env.rollout.extra_info["routed_experts"], local_only=False)
del data_item.env.rollout.extra_info["routed_experts"]
data_item.env.rollout.extra_info["routed_experts"] = routed_experts
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claude: Warning: The del then re-assign pattern here is functionally a no-op — del dict[key] followed by dict[key] = value is equivalent to just dict[key] = value. This works, but the del adds no value since you're immediately re-assigning to the same key.

More importantly, ray.internal.free is called on the ObjectRef before the key is deleted or reassigned. At this point routed_experts (the local variable) already holds the resolved tensor from ray.get(), so the ObjectRef being freed is the right one. The logic is correct, but the del/reassign could be simplified to just:

Suggested change
routed_experts = ray.get(data_item.env.rollout.extra_info["routed_experts"])
ray.internal.free(data_item.env.rollout.extra_info["routed_experts"], local_only=False)
del data_item.env.rollout.extra_info["routed_experts"]
data_item.env.rollout.extra_info["routed_experts"] = routed_experts
routed_experts = ray.get(data_item.env.rollout.extra_info["routed_experts"])
ray.internal.free(data_item.env.rollout.extra_info["routed_experts"], local_only=False)
data_item.env.rollout.extra_info["routed_experts"] = routed_experts

Comment on lines +371 to +373
# update sample params and extra params (use copy to avoid modifying global state)
current_sample_params = {**self.sample_params, **(sample_params.dict() if sample_params else {})}
current_extra_params = {**self.extra_params, **(extra_params if extra_params else {})}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claude: Critical: Good fix — the old code mutated self.sample_params and self.extra_params in place on every rollout() call, which meant evaluator's disable_routed_experts=True would permanently pollute the shared state for subsequent training rollouts. Using shallow copies here is the correct approach.

One note: since sample_params.dict() returns a new dict, the {**self.sample_params, ...} merge is correct. But if self.sample_params or self.extra_params contain mutable values (e.g., nested dicts/lists), downstream code could still mutate those shared objects through the shallow copy. If that's a concern, copy.deepcopy would be safer, though the shallow copy is likely fine for the current usage.

extra_params: Extra parameters for generation. If contains "disable_routed_experts=True",
will force disable return_routed_experts regardless of config.

Returns:
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claude: Nit: The docstring update for extra_params mentions "disable_routed_experts=True" as a dict key, which leaks an implementation detail into the environment interface. Consider describing it more generically, e.g., "Extra parameters for generation, passed through to the rollout controller."

"response_len": rollout_response_len_list[_count],
"versioned_response_len": data.env.rollout.versioned_num_return_tokens,
"label": data.data.reward_model["ground_truth"],
"label": data.data.reward_model.get("ground_truth", ""),
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claude: Warning: This changes the behavior of _save_trajectories — previously it would raise a KeyError if "ground_truth" was missing from reward_model. Now it silently saves an empty string. If the missing key indicates a data integrity issue, this could mask bugs. Consider logging a warning when the key is missing, or documenting why this fallback is expected.

assert "routed_experts" in response["meta_info"], (
"enable_return_routed_experts is True, but routed_experts is not in meta_info"
)
exist_history_routed_experts = (
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claude: Nit: Good use of .pop() here — it removes the key from meta_info in one step, which is cleaner than the previous access-then-leave pattern. Consistent with the memory leak fix theme.

@claude
Copy link

claude bot commented Mar 18, 2026

Claude: ## Summary

This PR makes two logical changes: (1) fixes memory leaks by properly freeing Ray ObjectRefs for routed experts, and (2) disables routed expert tracking during evaluation to reduce overhead.

Issues

Critical

  • [controller.py:371-373] The fix to use local copies instead of mutating self.sample_params/self.extra_params is the most important change in this PR. Without it, the evaluator's disable_routed_experts=True would permanently pollute shared state for training rollouts. Good catch.

Warning

  • [rl_trainer.py:896] The reward_model.get("ground_truth", "") change silently handles missing keys that previously would raise KeyError. If missing ground_truth indicates a data integrity bug, this could mask issues. Consider adding a warning log.
  • [replay_buffer.py:465-468] The del + re-assign pattern (del dict[key] then dict[key] = value) is functionally equivalent to just dict[key] = value. The del is unnecessary.

Nit

  • [single_turn_env.py:87] Docstring leaks implementation detail (disable_routed_experts=True). Consider a more generic description.
  • PR bundles two distinct changes (memory leak fix + evaluation optimization). These could have been separate PRs per project guidelines, but the changes are cohesive enough that this is minor.

Verdict

APPROVE — the core changes are correct and address real issues (memory leak via unfree'd ObjectRefs, state pollution via in-place dict mutation). The migration from ray._private.internal_api.free to ray.internal.free uses the proper public API. Minor suggestions above are non-blocking.

@YanhuiDua YanhuiDua merged commit 625c001 into InternLM:main Mar 20, 2026
32 of 33 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants