Skip to content

[WIP] Support qwen3-omni#4411

Draft
CUHKSZzxy wants to merge 1 commit intoInternLM:mainfrom
CUHKSZzxy:support-qwen3-omni
Draft

[WIP] Support qwen3-omni#4411
CUHKSZzxy wants to merge 1 commit intoInternLM:mainfrom
CUHKSZzxy:support-qwen3-omni

Conversation

@CUHKSZzxy
Copy link
Collaborator

@CUHKSZzxy CUHKSZzxy commented Mar 13, 2026

Objective

  • Support Qwen3-Omni thinker model. Qwen3-Omni talker model is ommitted since it is used for audio generation.
  • Move lmdeploy/vl -> lmdeploy/multimodal, since audio no longer belongs to vision category.
  • Modifications for better to_pytorch_aux and mixed modalities.

Test

  • Video test sample
xx
  • Audio test sample
xx
  • Audio + video test sample
xx

TODO

  • Support video
  • Support audio
  • Support mixed modality, image + video + audio
  • Better preprocess, smart image / video resize

Related

Prerequisite PR

@CUHKSZzxy CUHKSZzxy force-pushed the support-qwen3-omni branch from 8d64a7a to 4c6bc99 Compare March 19, 2026 07:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant