Support RL VLM of rl_design branch#1598
Support RL VLM of rl_design branch#1598hhaAndroid wants to merge 3 commits intoInternLM:rl_designfrom
Conversation
|
@claude review |
1 similar comment
|
@claude review |
|
Claude: ## Review Summary This PR adds vision-language (VL) model support to the RL training pipeline, specifically for Qwen3-VL. It introduces a new tokenize function, moves IssuesCritical
Warning
Nit
VerdictREQUEST_CHANGES |
|
@claude review |
| def get_train_seq_ctx( | ||
| input_ids: torch.LongTensor, multimodal_train_info: dict | None = None, len_response_ids: int = 0 | ||
| input_ids: torch.LongTensor, | ||
| position_ids: torch.Tensor, |
There was a problem hiding this comment.
Claude: Warning: Type hint says torch.Tensor but position_ids can be None — the body immediately checks if position_ids is not None. Should be torch.Tensor | None to match the actual contract.
| position_ids: torch.Tensor, | |
| position_ids: torch.Tensor | None, |
| @field_serializer("mm_info") | ||
| def _serialize_mm_info(self, value: MultimodalInfo | None) -> MultimodalInfo | None: | ||
| return None |
There was a problem hiding this comment.
Claude: Warning: This serializer unconditionally drops mm_info during serialization. This means any model_dump() / model_dump_json() call silently loses multimodal data. The return type annotation also says MultimodalInfo | None but only None is ever returned.
Please add a brief docstring explaining why this is intentional (e.g., mm_info contains numpy arrays / Ray ObjectRefs that can't be serialized), similar to the _serialize_routed_experts docstring above. This prevents future contributors from seeing the always-None return and thinking it's a bug.
| from ..mllm_tokenize_fn.qwen3_vl_tokenize_fn import Qwen3VLTokenizeFnConfig, Qwen3VLTokenizeFunction, QwenVL3DataItem | ||
| from ..utils import replace_image_context_and_collect_media_data | ||
|
|
||
|
|
There was a problem hiding this comment.
Claude: Warning: Missing type hints per project standards (CLAUDE.md: "All new code must include type hints for function signatures"). tokens and img_context_id have no type annotations, and the return type is missing.
| def remove_consecutive_img_context_tokens(tokens: list[int], img_context_id: int) -> list[int]: |
|
|
||
| def fake_collator(instances: list[RolloutState], **kwargs): | ||
| for rollout_state in instances: | ||
| if hasattr(rollout_state, "mm_info") and rollout_state.mm_info is not None: |
There was a problem hiding this comment.
Claude: Nit: RolloutState is a Pydantic BaseModel, so mm_info is always present as an attribute (with default None). The hasattr check is redundant — you can simplify to just if rollout_state.mm_info is not None.
Also, this function is missing type hints and a return type annotation per CLAUDE.md standards.
| if not isinstance(pixel_values, np.ndarray): | ||
| assert isinstance(pixel_values, list), ( | ||
| f"pixel_values should be list of tensor, got {type(pixel_values)}" | ||
| ) | ||
| pixel_values = [ray.get(pixel_obf) for pixel_obf in pixel_values] | ||
| pixel_values = [torch.as_tensor(pixel_value) for pixel_value in pixel_values] | ||
| pixel_values = torch.cat(pixel_values, dim=0) | ||
| seq_ctx.pixel_values = pixel_values | ||
| else: | ||
| raise NotImplementedError("The case where pixel_values is a numpy array is not implemented yet.") |
There was a problem hiding this comment.
Claude: Warning: The logic here only handles two cases: list (of Ray ObjectRefs) and np.ndarray (raises NotImplementedError). But a single Ray ObjectRef (not wrapped in a list) would fail the isinstance(pixel_values, list) assertion.
Tracing the flow: fake_collator stores each RolloutState's pixel_values as a single ray.put() ObjectRef. Unless there's an intermediate packing step that collects these into a list, seq_ctx.pixel_values arriving here as a single ObjectRef would hit the assertion error. Could you verify that the packing path always wraps ObjectRefs into a list before they reach this point?
Also, the NotImplementedError for the np.ndarray case seems like it should be straightforward to implement (just torch.as_tensor(pixel_values)) — is this intentionally deferred?
|
Claude: ## Review Summary This PR adds VLM (Qwen3-VL) support to the RL training pipeline. Changes since the last review address several previously flagged issues (function naming, import ordering, Remaining IssuesWarning
Nit
VerdictREQUEST_CHANGES — mostly type hint and documentation gaps, plus one data flow question on the pixel_values path in |
No description provided.