Qwen3.5moe by vvvdwbvvv · Pull Request #1115 · linkedin/Liger-Kernel

vvvdwbvvv · 2026-03-02T07:41:52Z

Summary

Adding Qwen3.5_MOE model support
https://huggingface.co/docs/transformers/en/model_doc/qwen3_5_moe
Related issue: #1110

Note

There's undeterministic problem on all qwen's moe model series, not sure what it is. Maybe we should re-examine all the MoE model to check the convergence test
error log:

num_tokens_per_expert = torch.histc(histc_input, bins=self.num_experts, min=0, max=self.num_experts - 1)
E       RuntimeError: _histc_cuda does not have a deterministic implementation, but you set 'torch.use_deterministic_algorithms(True)'. You can turn off determinism just for this operation, or you can use the 'warn_only=True' option, if that's acceptable for your application. You can also file an issue at https://github.com/pytorch/pytorch/issues to help us prioritize adding deterministic support for this operation.

Testing Done

Hardware Type:
[x ] run make test to ensure correctness
[x ] run make checkstyle to ensure code style
[] run make test-convergence to ensure convergence

…model parameters

…ments

albertvillanova

I think for qwen3_5_moe model, mm_token_type_ids param should also be added to lce_forward, as already done for other VL models:

#1120

See code in transformers:

https://github.com/huggingface/transformers/blob/7235d44257e7e4765317d475622e2085a09c9e3b/src/transformers/models/qwen3_5_moe/modeling_qwen3_5_moe.py#L1840-L1851

CC: @kashif

albertvillanova · 2026-03-06T09:58:59Z

    output_attentions: Optional[bool] = None,
    output_hidden_states: Optional[bool] = None,
    output_router_logits: Optional[bool] = None,
    cache_position: Optional[torch.LongTensor] = None,


Suggested change

mm_token_type_ids: Optional[torch.IntTensor] = None,

cache_position: Optional[torch.LongTensor] = None,

albertvillanova · 2026-03-06T09:59:55Z

        output_attentions=output_attentions,
        output_hidden_states=output_hidden_states,
        output_router_logits=output_router_logits,
        cache_position=cache_position,


Suggested change

mm_token_type_ids=mm_token_type_ids,

cache_position=cache_position,

vvvdwbvvv · 2026-03-06T10:18:48Z

@albertvillanova Thank you for reviewing, I'll fix that.

vvvdwbvvv · 2026-03-07T05:09:55Z

closed for it is done at #1109

vvvdwbvvv added 14 commits March 2, 2026 14:58

Add lce_forward function for qwen_3_5_moe

9398fc0

Add support for Qwen3.5 MoE with Liger kernel integration

6fac121

chore

57ccdbe

Add vocab_size parameter to Qwen3.5 MoE configuration in test

3ee6bfb

Add pad_token_id parameter to Qwen3.5 MoE configuration in test

6994cfa

Update mlp_only_layers parameter to an empty list in Qwen3.5 MoE test

aa04c53

Remove unused Qwen3 MoE imports from test_mini_models.py

f5667d5

Add layer_types parameter to Qwen3.5 MoE test configuration

984d265

Refactor Qwen3.5 MoE configuration to use text_config dictionary for …

ad6e3a4

…model parameters

Remove layer_types parameter from Qwen3.5 MoE test configuration

a434d94

fix import

eef029c

Update Qwen3.5 MoE model configuration with new parameters and adjust…

a6e6823

…ments

Merge branch 'main' into qwen3.5moe

593309a

chore

c3841d1

vvvdwbvvv marked this pull request as ready for review March 6, 2026 06:34

albertvillanova reviewed Mar 6, 2026

View reviewed changes

vvvdwbvvv closed this Mar 7, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Qwen3.5moe#1115

Qwen3.5moe#1115
vvvdwbvvv wants to merge 14 commits intolinkedin:mainfrom
vvvdwbvvv:qwen3.5moe

vvvdwbvvv commented Mar 2, 2026 •

edited

Loading

Uh oh!

albertvillanova left a comment •

edited

Loading

Uh oh!

albertvillanova Mar 6, 2026

Uh oh!

albertvillanova Mar 6, 2026

Uh oh!

vvvdwbvvv commented Mar 6, 2026

Uh oh!

vvvdwbvvv commented Mar 7, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants


	mm_token_type_ids: Optional[torch.IntTensor] = None,
	cache_position: Optional[torch.LongTensor] = None,


	mm_token_type_ids=mm_token_type_ids,
	cache_position=cache_position,

Conversation

vvvdwbvvv commented Mar 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Note

Testing Done

Uh oh!

albertvillanova left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

albertvillanova Mar 6, 2026

Choose a reason for hiding this comment

Uh oh!

albertvillanova Mar 6, 2026

Choose a reason for hiding this comment

Uh oh!

vvvdwbvvv commented Mar 6, 2026

Uh oh!

vvvdwbvvv commented Mar 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

vvvdwbvvv commented Mar 2, 2026 •

edited

Loading

albertvillanova left a comment •

edited

Loading

vvvdwbvvv commented Mar 7, 2026 •

edited

Loading