Support Qwen3 Omni by CUHKSZzxy · Pull Request #4411 · InternLM/lmdeploy

CUHKSZzxy · 2026-03-13T10:50:34Z

Summary

Support Qwen3-Omni thinker inference in the PyTorch backend.

This PR adds Qwen3-Omni model registration, HF processor integration, and multimodal preprocessing for image, video, audio, and mixed image/audio/video inputs. Audio support is currently limited to Qwen3-Omni.

Changes

Add Qwen3-Omni PyTorch thinker model support.
Add Qwen3-Omni VL preprocessor using the shared get_input_prompt -> preprocess path.
Support image-only, video-only, audio-only, and mixed image/audio/video prompts.
Keep Qwen3-Omni video expansion as whole-video spans, distinct from Qwen3VL per-frame timestamp handling.
Add audio media parsing for OpenAI-style multimodal messages.
Add multimodal input docs and examples, including Qwen3-Omni audio usage.

Accuracy Check

Local run config:

Model: Qwen3-Omni-30B-A3B-Instruct
Backend: LMDeploy PyTorch, tp=1
Server: OpenAI-compatible API
Decode: temperature=0

Benchmark	LMDeploy local result	Official related score	Notes
GSM8K	1258 / 1314 = 95.74%	91.36	Official score is from Qwen3-Omni technical report Table 16 for `Qwen3-Omni-30B-A3B-Base-202507`, not the exact Instruct checkpoint/harness.
OCRBench	848 / 1000 = 84.80%	86.0	Official score is from Qwen3-Omni technical report Table 16 for `Qwen3-Omni-30B-A3B-Base-202507`, not the exact Instruct checkpoint/harness.

Artifacts are saved locally under benchmark/e2e_qwen3_omni_gsm8k_ocrbench/.

Reference: Qwen3-Omni Technical Report, Table 16: https://arxiv.org/pdf/2509.17765

Notes

Talker/audio-generation support is not included.
Audio input support is scoped to Qwen3-Omni.
Advanced use_audio_in_video=True interleaving is not enabled in this patch.

Assistance

Assisted with Codex + GPT-5.5 High

# Conflicts: # lmdeploy/model.py # lmdeploy/serve/processors/multimodal.py

# Conflicts: # lmdeploy/archs.py

Copilot

Pull request overview

This PR adds PyTorch-backend support for Qwen3-Omni (thinker), extending the multimodal preprocessing pipeline to handle audio (alongside image/video) and registering the new architecture across model/config dispatch. It also updates the OpenAI-style multimodal message parsing and documentation to include audio inputs.

Changes:

Register Qwen3-Omni for VL preprocessing + PyTorch model loading (module map, arch list, config builder, max-len derivation).
Extend multimodal preprocessing/utilities to support audio features, plus mixed image/audio/video offset handling.
Add audio media loading and update API-server multimodal parsing + docs/examples for audio usage.

Reviewed changes

Copilot reviewed 24 out of 24 changed files in this pull request and generated 2 comments.

Show a summary per file

File	Description
tests/test_lmdeploy/test_vl/test_qwen3_omni_processor.py	Adds unit tests for Qwen3-Omni preprocessing, mixed-modality offsets, and audio masking/mrope behavior.
tests/test_lmdeploy/test_content_merge.py	Extends multimodal parsing tests to include audio items and updates “unknown type” coverage.
lmdeploy/vl/model/qwen3_omni.py	Adds Qwen3-Omni VL model registration + HF processor integration and special-token wiring.
lmdeploy/vl/model/preprocess_utils.py	Expands bundled HF outputs for video/audio and sorts expanded items by offsets to restore prompt order.
lmdeploy/vl/model/builder.py	Registers Qwen3-Omni in the VL model builder import list.
lmdeploy/vl/model/base.py	Extends new-style `VisionModel.preprocess` to collect audio inputs and pass `audio_kwargs` to HF processors.
lmdeploy/vl/media/audio.py	Introduces audio MediaIO implementation (librosa/soundfile) for URL/file/base64 audio loading.
lmdeploy/utils.py	Adjusts max-length derivation to use `thinker_config.text_config` for Qwen3-Omni thinker configs.
lmdeploy/serve/processors/multimodal.py	Adds OpenAI-style `audio_url`/`audio` parsing using `AudioMediaIO`; updates multimodal type detection.
lmdeploy/pytorch/multimodal/data_type.py	Reorders `MultiModalData` fields to place `modality` before `mrope_pos_ids`.
lmdeploy/pytorch/models/utils/model.py	Extends multimodal mask computation to include audio token IDs.
lmdeploy/pytorch/models/qwen3_omni_moe_thinker.py	Adds the Qwen3-Omni thinker PyTorch model, including audio tower + mixed-modality input processing.
lmdeploy/pytorch/models/module_map.py	Maps HF arch `Qwen3OmniMoeForConditionalGeneration` to the PyTorch thinker implementation.
lmdeploy/pytorch/configurations/qwen3_omni.py	Adds a config builder for Qwen3-Omni (thinker text config + mrope enabled).
lmdeploy/model.py	Improves chat-template resolution by falling back to processor-provided `chat_template` when tokenizer lacks it.
lmdeploy/archs.py	Adds Qwen3-Omni to VL arch detection and marks it unsupported for TurboMind.
docs/zh_cn/multi_modal/vl_pipeline.md	Updates “see also” to include audio in multimodal inputs reference.
docs/zh_cn/multi_modal/multimodal_inputs.md	Adds audio input docs/examples and updates native video support note to include Qwen3-Omni.
docs/zh_cn/multi_modal/index.rst	Adjusts toctree structure for the multi-modal section.
docs/zh_cn/index.rst	Adds multimodal inputs page to the main Chinese documentation index.
docs/en/multi_modal/vl_pipeline.md	Updates “see also” to include audio in multimodal inputs reference.
docs/en/multi_modal/multimodal_inputs.md	Adds audio input docs/examples and updates native video support note to include Qwen3-Omni.
docs/en/multi_modal/index.rst	Adjusts toctree structure for the multi-modal section.
docs/en/index.rst	Adds multimodal inputs page to the main English documentation index.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

# Conflicts: # tests/test_lmdeploy/test_vl/test_preprocess_utils.py

support qwen3-omni

4c6bc99

CUHKSZzxy force-pushed the support-qwen3-omni branch from 8d64a7a to 4c6bc99 Compare March 19, 2026 07:13

CUHKSZzxy added 22 commits March 24, 2026 12:30

Merge branch 'main' into support-qwen3-omni

6dd1891

Merge branch 'main' into support-qwen3-omni

c2a133a

Merge branch 'main' into support-qwen3-omni

db0831a

minor

eb33e33

use builtin mrope

4f6c57d

Merge branch 'main' into support-qwen3-omni

e77cbc0

minor fix

b81d76b

Merge branch 'main' into support-qwen3-omni

2f27597

Add Qwen3 Omni new preprocess support

10257fd

Merge remote-tracking branch 'upstream/main' into support-qwen3-omni

a096829

# Conflicts: # lmdeploy/model.py # lmdeploy/serve/processors/multimodal.py

Pass trust_remote_code to Qwen3 Omni processor

684c14a

docs: restore Qwen3 Omni audio examples

fe2b067

Refine Qwen3 Omni special token setup

cc7fb80

docs: surface multimodal input guide

6ee1139

Refine multimodal masks for Qwen3 Omni

5edaa09

Fix multimodal parser audio test dependency

2f4cd07

Restore legacy PyTorch aux preprocessing

07217c6

Refine Qwen3 Omni processor tests

7fdd954

Revert InternS1 Pro mask refactor

2afb951

Support Qwen3-Omni audio and video preprocess

cbc0119

Refine Qwen3-Omni mrope setup

11c9aeb

Merge branch 'main' into support-qwen3-omni

bec0a3c

# Conflicts: # lmdeploy/archs.py

CUHKSZzxy changed the title ~~[WIP] Support qwen3-omni~~ Support Qwen3 Omni May 11, 2026

CUHKSZzxy marked this pull request as ready for review May 11, 2026 04:37

Copilot AI review requested due to automatic review settings May 11, 2026 04:37

Copilot started reviewing on behalf of CUHKSZzxy May 11, 2026 04:38 View session

Copilot AI reviewed May 11, 2026

View reviewed changes

Comment thread lmdeploy/pytorch/models/qwen3_omni_moe_thinker.py Outdated

Comment thread lmdeploy/vl/media/audio.py Outdated

fix: address qwen3 omni copilot review

d3653dc

fix: compact qwen3 omni multimodal slices

a5d63cf

lvhan028 added enhancement New feature or request labels May 13, 2026

Merge remote-tracking branch 'upstream/main' into support-qwen3-omni

233c2cc

# Conflicts: # tests/test_lmdeploy/test_vl/test_preprocess_utils.py

lvhan028 requested review from grimoire and lvhan028 May 20, 2026 04:15

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support Qwen3 Omni#4411

Support Qwen3 Omni#4411
CUHKSZzxy wants to merge 26 commits into
InternLM:mainfrom
CUHKSZzxy:support-qwen3-omni

CUHKSZzxy commented Mar 13, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

CUHKSZzxy commented Mar 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes

Accuracy Check

Notes

Related

Assistance

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

CUHKSZzxy commented Mar 13, 2026 •

edited

Loading