[Improve] Free routed experts ray obj ref to avoid memory leak by RangiLyu · Pull Request #1595 · InternLM/xtuner

RangiLyu · 2026-03-18T09:27:42Z

No description provided.

YanhuiDua · 2026-03-18T12:00:40Z

xtuner/v1/ray/dataflow/replay_buffer.py

            assert sample.data.input_ids and sample.data.num_tokens, "input_ids or num_tokens is empty!"
            if "routed_experts" in sample.env.rollout.extra_info:
-                ray._private.internal_api.free(sample.env.rollout.extra_info["routed_experts"])
+                ray.internal.free(sample.env.rollout.extra_info["routed_experts"], local_only=False)


这个是修改是因为ray的api有更新吗？

调用外部库的私有方法不安全，统一改了

ray文档说在释放显式序列化的object.ref时，需要显式调用ray._private.internal_api.free(obj_ref)来释放 https://docs.ray.io/en/latest/ray-core/objects/serialization.html#serializing-objectrefs

https://github.com/ray-project/ray/blob/cb9c80fee6a700efe61ea97987248ce82e3fa2e2/python/ray/internal/__init__.py#L1

这两者等价的

YanhuiDua · 2026-03-18T12:01:06Z

@claude review

claude · 2026-03-18T12:04:00Z

xtuner/v1/ray/dataflow/replay_buffer.py

+                routed_experts = ray.get(data_item.env.rollout.extra_info["routed_experts"])
+                ray.internal.free(data_item.env.rollout.extra_info["routed_experts"], local_only=False)
+                del data_item.env.rollout.extra_info["routed_experts"]
+                data_item.env.rollout.extra_info["routed_experts"] = routed_experts


Claude: Warning: The del then re-assign pattern here is functionally a no-op — del dict[key] followed by dict[key] = value is equivalent to just dict[key] = value. This works, but the del adds no value since you're immediately re-assigning to the same key.

More importantly, ray.internal.free is called on the ObjectRef before the key is deleted or reassigned. At this point routed_experts (the local variable) already holds the resolved tensor from ray.get(), so the ObjectRef being freed is the right one. The logic is correct, but the del/reassign could be simplified to just:

Suggested change

routed_experts = ray.get(data_item.env.rollout.extra_info["routed_experts"])

ray.internal.free(data_item.env.rollout.extra_info["routed_experts"], local_only=False)

del data_item.env.rollout.extra_info["routed_experts"]

data_item.env.rollout.extra_info["routed_experts"] = routed_experts

routed_experts = ray.get(data_item.env.rollout.extra_info["routed_experts"])

ray.internal.free(data_item.env.rollout.extra_info["routed_experts"], local_only=False)

data_item.env.rollout.extra_info["routed_experts"] = routed_experts

claude · 2026-03-18T12:04:05Z

xtuner/v1/ray/rollout/controller.py

+        # update sample params and extra params (use copy to avoid modifying global state)
+        current_sample_params = {**self.sample_params, **(sample_params.dict() if sample_params else {})}
+        current_extra_params = {**self.extra_params, **(extra_params if extra_params else {})}


Claude: Critical: Good fix — the old code mutated self.sample_params and self.extra_params in place on every rollout() call, which meant evaluator's disable_routed_experts=True would permanently pollute the shared state for subsequent training rollouts. Using shallow copies here is the correct approach.

One note: since sample_params.dict() returns a new dict, the {**self.sample_params, ...} merge is correct. But if self.sample_params or self.extra_params contain mutable values (e.g., nested dicts/lists), downstream code could still mutate those shared objects through the shallow copy. If that's a concern, copy.deepcopy would be safer, though the shallow copy is likely fine for the current usage.

claude · 2026-03-18T12:04:08Z

xtuner/v1/ray/environment/single_turn_env.py

+            extra_params: Extra parameters for generation. If contains "disable_routed_experts=True",
+                will force disable return_routed_experts regardless of config.

        Returns:


Claude: Nit: The docstring update for extra_params mentions "disable_routed_experts=True" as a dict key, which leaks an implementation detail into the environment interface. Consider describing it more generically, e.g., "Extra parameters for generation, passed through to the rollout controller."

claude · 2026-03-18T12:04:12Z

xtuner/v1/train/rl_trainer.py

                        "response_len": rollout_response_len_list[_count],
                        "versioned_response_len": data.env.rollout.versioned_num_return_tokens,
-                        "label": data.data.reward_model["ground_truth"],
+                        "label": data.data.reward_model.get("ground_truth", ""),


Claude: Warning: This changes the behavior of _save_trajectories — previously it would raise a KeyError if "ground_truth" was missing from reward_model. Now it silently saves an empty string. If the missing key indicates a data integrity issue, this could mask bugs. Consider logging a warning when the key is missing, or documenting why this fallback is expected.

claude · 2026-03-18T12:04:14Z

xtuner/v1/ray/rollout/worker.py

                    assert "routed_experts" in response["meta_info"], (
                        "enable_return_routed_experts is True, but routed_experts is not in meta_info"
                    )
                    exist_history_routed_experts = (


Claude: Nit: Good use of .pop() here — it removes the key from meta_info in one step, which is cleaner than the previous access-then-leave pattern. Consistent with the memory leak fix theme.

claude · 2026-03-18T12:04:30Z

Claude: ## Summary

This PR makes two logical changes: (1) fixes memory leaks by properly freeing Ray ObjectRefs for routed experts, and (2) disables routed expert tracking during evaluation to reduce overhead.

Issues

Critical

[controller.py:371-373] The fix to use local copies instead of mutating self.sample_params/self.extra_params is the most important change in this PR. Without it, the evaluator's disable_routed_experts=True would permanently pollute shared state for training rollouts. Good catch.

Warning

[rl_trainer.py:896] The reward_model.get("ground_truth", "") change silently handles missing keys that previously would raise KeyError. If missing ground_truth indicates a data integrity bug, this could mask issues. Consider adding a warning log.
[replay_buffer.py:465-468] The del + re-assign pattern (del dict[key] then dict[key] = value) is functionally equivalent to just dict[key] = value. The del is unnecessary.

Nit

[single_turn_env.py:87] Docstring leaks implementation detail (disable_routed_experts=True). Consider a more generic description.
PR bundles two distinct changes (memory leak fix + evaluation optimization). These could have been separate PRs per project guidelines, but the changes are cohesive enough that this is minor.

Verdict

APPROVE — the core changes are correct and address real issues (memory leak via unfree'd ObjectRefs, state pollution via in-place dict mutation). The migration from ray._private.internal_api.free to ray.internal.free uses the proper public API. Minor suggestions above are non-blocking.

RangiLyu added 2 commits March 18, 2026 17:25

[Improve] Free routed experts ray obj ref to avoid memory leak

97c7af1

disable R3 during evaluation

fb91576

YanhuiDua reviewed Mar 18, 2026

View reviewed changes

claude bot reviewed Mar 18, 2026

View reviewed changes

YanhuiDua approved these changes Mar 19, 2026

View reviewed changes

YanhuiDua merged commit 625c001 into InternLM:main Mar 20, 2026
32 of 33 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Improve] Free routed experts ray obj ref to avoid memory leak#1595

[Improve] Free routed experts ray obj ref to avoid memory leak#1595
YanhuiDua merged 2 commits intoInternLM:mainfrom
RangiLyu:lcq/free-expert-ref

RangiLyu commented Mar 18, 2026

Uh oh!

YanhuiDua Mar 18, 2026

Uh oh!

RangiLyu Mar 18, 2026

Uh oh!

YanhuiDua Mar 18, 2026

Uh oh!

RangiLyu Mar 18, 2026

Uh oh!

YanhuiDua commented Mar 18, 2026

Uh oh!

claude bot Mar 18, 2026

Uh oh!

claude bot Mar 18, 2026

Uh oh!

claude bot Mar 18, 2026

Uh oh!

claude bot Mar 18, 2026

Uh oh!

claude bot Mar 18, 2026

Uh oh!

claude bot commented Mar 18, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

RangiLyu commented Mar 18, 2026

Uh oh!

YanhuiDua Mar 18, 2026

Choose a reason for hiding this comment

Uh oh!

RangiLyu Mar 18, 2026

Choose a reason for hiding this comment

Uh oh!

YanhuiDua Mar 18, 2026

Choose a reason for hiding this comment

Uh oh!

RangiLyu Mar 18, 2026

Choose a reason for hiding this comment

Uh oh!

YanhuiDua commented Mar 18, 2026

Uh oh!

claude bot Mar 18, 2026

Choose a reason for hiding this comment

Uh oh!

claude bot Mar 18, 2026

Choose a reason for hiding this comment

Uh oh!

claude bot Mar 18, 2026

Choose a reason for hiding this comment

Uh oh!

claude bot Mar 18, 2026

Choose a reason for hiding this comment

Uh oh!

claude bot Mar 18, 2026

Choose a reason for hiding this comment

Uh oh!

claude bot commented Mar 18, 2026

Issues

Critical

Warning

Nit

Verdict

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants