[Bug] 96G显存RTX6000显卡/windows系统/用turbomind推理引擎运行Qwen3-30B-A3B-AWQ模型，报错，显存超出

### Checklist

- [x] 1. I have searched related issues but cannot get the expected help.
- [x] 2. The bug has not been fixed in the latest version.
- [ ] 3. Please note that if the bug-related issue you submitted lacks corresponding environment info and a minimal reproducible demo, it will be challenging for us to reproduce and resolve the issue, reducing the likelihood of receiving feedback.

### Describe the bug

<img width="1083" height="607" alt="Image" src="https://github.com/user-attachments/assets/6455828d-a519-423e-954a-aa750b3e2ede" />
96G显存RTX6000显卡/windows系统/用turbomind推理引擎运行Qwen3-30B-A3B-AWQ模型，报错，显存超出。

### Reproduction

lmdeploy serve api_server C:\Qwen3-30B-A3B-Instruct-2507-AWQ --model-name Qwen3 --server-port 1234 --api-key abcd+1234 --session-len 2048 --max-batch-size 1

### Environment

```Shell
windows10+python10+cuda12.9
Package                   Version
------------------------- -------------
accelerate                1.13.0
addict                    2.4.0
aiohappyeyeballs          2.6.1
aiohttp                   3.13.3
aiosignal                 1.4.0
annotated-doc             0.0.4
annotated-types           0.7.0
anyio                     4.12.1
async-timeout             5.0.1
attrs                     25.4.0
certifi                   2026.2.25
charset-normalizer        3.4.5
click                     8.3.1
cloudpickle               3.1.2
colorama                  0.4.6
distro                    1.9.0
einops                    0.8.2
exceptiongroup            1.3.1
fastapi                   0.135.1
filelock                  3.25.1
fire                      0.7.1
frozenlist                1.8.0
fsspec                    2026.2.0
h11                       0.16.0
httpcore                  1.0.9
httpx                     0.28.1
huggingface_hub           0.36.2
idna                      3.11
Jinja2                    3.1.6
jiter                     0.13.0
jsonschema                4.26.0
jsonschema-specifications 2025.9.1
lmdeploy                  0.12.1
markdown-it-py            4.0.0
MarkupSafe                3.0.3
mdurl                     0.1.2
mmengine-lite             0.10.7
mpmath                    1.3.0
msgpack                   1.1.2
multidict                 6.7.1
networkx                  3.4.2
numpy                     2.2.6
openai                    2.26.0
openai-harmony            0.0.8
packaging                 26.0
partial-json-parser       0.2.1.1.post7
peft                      0.14.0
pillow                    12.1.1
pip                       26.0.1
platformdirs              4.9.4
prometheus_client         0.24.1
propcache                 0.4.1
protobuf                  7.34.0
psutil                    7.2.2
pybase64                  1.4.3
pydantic                  2.12.5
pydantic_core             2.41.5
Pygments                  2.19.2
PyYAML                    6.0.3
pyzmq                     27.1.0
ray                       2.54.0
referencing               0.37.0
regex                     2026.2.28
requests                  2.32.5
rich                      14.3.3
rpds-py                   0.30.0
safetensors               0.7.0
sentencepiece             0.2.1
setuptools                65.5.0
shortuuid                 1.0.13
sniffio                   1.3.1
starlette                 0.52.1
sympy                     1.14.0
termcolor                 3.3.0
tiktoken                  0.12.0
tokenizers                0.22.2
tomli                     2.4.0
torch                     2.8.0+cu129
torchaudio                2.8.0+cu129
torchvision               0.23.0
tqdm                      4.67.3
transformers              4.57.6
typing_extensions         4.15.0
typing-inspection         0.4.2
urllib3                   2.6.3
uvicorn                   0.41.0
xgrammar                  0.1.32
yapf                      0.43.0
yarl                      1.23.0
```

### Error traceback

```Shell

```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug] 96G显存RTX6000显卡/windows系统/用turbomind推理引擎运行Qwen3-30B-A3B-AWQ模型，报错，显存超出 #4401

Checklist

Describe the bug

Reproduction

Environment

Error traceback

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Bug] 96G显存RTX6000显卡/windows系统/用turbomind推理引擎运行Qwen3-30B-A3B-AWQ模型，报错，显存超出 #4401

Description

Checklist

Describe the bug

Reproduction

Environment

Error traceback

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions