Skip to content

Qwen3.5moe#1115

Closed
vvvdwbvvv wants to merge 14 commits intolinkedin:mainfrom
vvvdwbvvv:qwen3.5moe
Closed

Qwen3.5moe#1115
vvvdwbvvv wants to merge 14 commits intolinkedin:mainfrom
vvvdwbvvv:qwen3.5moe

Conversation

@vvvdwbvvv
Copy link
Copy Markdown
Contributor

@vvvdwbvvv vvvdwbvvv commented Mar 2, 2026

Summary

Adding Qwen3.5_MOE model support
https://huggingface.co/docs/transformers/en/model_doc/qwen3_5_moe
Related issue: #1110

Note

There's undeterministic problem on all qwen's moe model series, not sure what it is. Maybe we should re-examine all the MoE model to check the convergence test
error log:

num_tokens_per_expert = torch.histc(histc_input, bins=self.num_experts, min=0, max=self.num_experts - 1)
E       RuntimeError: _histc_cuda does not have a deterministic implementation, but you set 'torch.use_deterministic_algorithms(True)'. You can turn off determinism just for this operation, or you can use the 'warn_only=True' option, if that's acceptable for your application. You can also file an issue at https://github.com/pytorch/pytorch/issues to help us prioritize adding deterministic support for this operation.

Testing Done

  • Hardware Type:
  • [x ] run make test to ensure correctness
  • [x ] run make checkstyle to ensure code style
  • [] run make test-convergence to ensure convergence

@vvvdwbvvv vvvdwbvvv marked this pull request as ready for review March 6, 2026 06:34
Copy link
Copy Markdown
Contributor

@albertvillanova albertvillanova left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think for qwen3_5_moe model, mm_token_type_ids param should also be added to lce_forward, as already done for other VL models:

See code in transformers:

CC: @kashif

output_attentions: Optional[bool] = None,
output_hidden_states: Optional[bool] = None,
output_router_logits: Optional[bool] = None,
cache_position: Optional[torch.LongTensor] = None,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
mm_token_type_ids: Optional[torch.IntTensor] = None,
cache_position: Optional[torch.LongTensor] = None,

output_attentions=output_attentions,
output_hidden_states=output_hidden_states,
output_router_logits=output_router_logits,
cache_position=cache_position,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
mm_token_type_ids=mm_token_type_ids,
cache_position=cache_position,

@vvvdwbvvv
Copy link
Copy Markdown
Contributor Author

@albertvillanova Thank you for reviewing, I'll fix that.

@vvvdwbvvv
Copy link
Copy Markdown
Contributor Author

vvvdwbvvv commented Mar 7, 2026

closed for it is done at #1109

@vvvdwbvvv vvvdwbvvv closed this Mar 7, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants