[Improve] Free routed experts ray obj ref to avoid memory leak#1595
[Improve] Free routed experts ray obj ref to avoid memory leak#1595YanhuiDua merged 2 commits intoInternLM:mainfrom
Conversation
| assert sample.data.input_ids and sample.data.num_tokens, "input_ids or num_tokens is empty!" | ||
| if "routed_experts" in sample.env.rollout.extra_info: | ||
| ray._private.internal_api.free(sample.env.rollout.extra_info["routed_experts"]) | ||
| ray.internal.free(sample.env.rollout.extra_info["routed_experts"], local_only=False) |
There was a problem hiding this comment.
调用外部库的私有方法不安全,统一改了
There was a problem hiding this comment.
ray文档说在释放显式序列化的object.ref时,需要显式调用ray._private.internal_api.free(obj_ref)来释放 https://docs.ray.io/en/latest/ray-core/objects/serialization.html#serializing-objectrefs
There was a problem hiding this comment.
|
@claude review |
| routed_experts = ray.get(data_item.env.rollout.extra_info["routed_experts"]) | ||
| ray.internal.free(data_item.env.rollout.extra_info["routed_experts"], local_only=False) | ||
| del data_item.env.rollout.extra_info["routed_experts"] | ||
| data_item.env.rollout.extra_info["routed_experts"] = routed_experts |
There was a problem hiding this comment.
Claude: Warning: The del then re-assign pattern here is functionally a no-op — del dict[key] followed by dict[key] = value is equivalent to just dict[key] = value. This works, but the del adds no value since you're immediately re-assigning to the same key.
More importantly, ray.internal.free is called on the ObjectRef before the key is deleted or reassigned. At this point routed_experts (the local variable) already holds the resolved tensor from ray.get(), so the ObjectRef being freed is the right one. The logic is correct, but the del/reassign could be simplified to just:
| routed_experts = ray.get(data_item.env.rollout.extra_info["routed_experts"]) | |
| ray.internal.free(data_item.env.rollout.extra_info["routed_experts"], local_only=False) | |
| del data_item.env.rollout.extra_info["routed_experts"] | |
| data_item.env.rollout.extra_info["routed_experts"] = routed_experts | |
| routed_experts = ray.get(data_item.env.rollout.extra_info["routed_experts"]) | |
| ray.internal.free(data_item.env.rollout.extra_info["routed_experts"], local_only=False) | |
| data_item.env.rollout.extra_info["routed_experts"] = routed_experts |
| # update sample params and extra params (use copy to avoid modifying global state) | ||
| current_sample_params = {**self.sample_params, **(sample_params.dict() if sample_params else {})} | ||
| current_extra_params = {**self.extra_params, **(extra_params if extra_params else {})} |
There was a problem hiding this comment.
Claude: Critical: Good fix — the old code mutated self.sample_params and self.extra_params in place on every rollout() call, which meant evaluator's disable_routed_experts=True would permanently pollute the shared state for subsequent training rollouts. Using shallow copies here is the correct approach.
One note: since sample_params.dict() returns a new dict, the {**self.sample_params, ...} merge is correct. But if self.sample_params or self.extra_params contain mutable values (e.g., nested dicts/lists), downstream code could still mutate those shared objects through the shallow copy. If that's a concern, copy.deepcopy would be safer, though the shallow copy is likely fine for the current usage.
| extra_params: Extra parameters for generation. If contains "disable_routed_experts=True", | ||
| will force disable return_routed_experts regardless of config. | ||
|
|
||
| Returns: |
There was a problem hiding this comment.
Claude: Nit: The docstring update for extra_params mentions "disable_routed_experts=True" as a dict key, which leaks an implementation detail into the environment interface. Consider describing it more generically, e.g., "Extra parameters for generation, passed through to the rollout controller."
| "response_len": rollout_response_len_list[_count], | ||
| "versioned_response_len": data.env.rollout.versioned_num_return_tokens, | ||
| "label": data.data.reward_model["ground_truth"], | ||
| "label": data.data.reward_model.get("ground_truth", ""), |
There was a problem hiding this comment.
Claude: Warning: This changes the behavior of _save_trajectories — previously it would raise a KeyError if "ground_truth" was missing from reward_model. Now it silently saves an empty string. If the missing key indicates a data integrity issue, this could mask bugs. Consider logging a warning when the key is missing, or documenting why this fallback is expected.
| assert "routed_experts" in response["meta_info"], ( | ||
| "enable_return_routed_experts is True, but routed_experts is not in meta_info" | ||
| ) | ||
| exist_history_routed_experts = ( |
There was a problem hiding this comment.
Claude: Nit: Good use of .pop() here — it removes the key from meta_info in one step, which is cleaner than the previous access-then-leave pattern. Consistent with the memory leak fix theme.
|
Claude: ## Summary This PR makes two logical changes: (1) fixes memory leaks by properly freeing Ray ObjectRefs for routed experts, and (2) disables routed expert tracking during evaluation to reduce overhead. IssuesCritical
Warning
Nit
VerdictAPPROVE — the core changes are correct and address real issues (memory leak via unfree'd ObjectRefs, state pollution via in-place dict mutation). The migration from |
No description provided.