Checklist
Describe the bug

96G显存RTX6000显卡/windows系统/用turbomind推理引擎运行Qwen3-30B-A3B-AWQ模型,报错,显存超出。
Reproduction
lmdeploy serve api_server C:\Qwen3-30B-A3B-Instruct-2507-AWQ --model-name Qwen3 --server-port 1234 --api-key abcd+1234 --session-len 2048 --max-batch-size 1
Environment
windows10+python10+cuda12.9
Package Version
------------------------- -------------
accelerate 1.13.0
addict 2.4.0
aiohappyeyeballs 2.6.1
aiohttp 3.13.3
aiosignal 1.4.0
annotated-doc 0.0.4
annotated-types 0.7.0
anyio 4.12.1
async-timeout 5.0.1
attrs 25.4.0
certifi 2026.2.25
charset-normalizer 3.4.5
click 8.3.1
cloudpickle 3.1.2
colorama 0.4.6
distro 1.9.0
einops 0.8.2
exceptiongroup 1.3.1
fastapi 0.135.1
filelock 3.25.1
fire 0.7.1
frozenlist 1.8.0
fsspec 2026.2.0
h11 0.16.0
httpcore 1.0.9
httpx 0.28.1
huggingface_hub 0.36.2
idna 3.11
Jinja2 3.1.6
jiter 0.13.0
jsonschema 4.26.0
jsonschema-specifications 2025.9.1
lmdeploy 0.12.1
markdown-it-py 4.0.0
MarkupSafe 3.0.3
mdurl 0.1.2
mmengine-lite 0.10.7
mpmath 1.3.0
msgpack 1.1.2
multidict 6.7.1
networkx 3.4.2
numpy 2.2.6
openai 2.26.0
openai-harmony 0.0.8
packaging 26.0
partial-json-parser 0.2.1.1.post7
peft 0.14.0
pillow 12.1.1
pip 26.0.1
platformdirs 4.9.4
prometheus_client 0.24.1
propcache 0.4.1
protobuf 7.34.0
psutil 7.2.2
pybase64 1.4.3
pydantic 2.12.5
pydantic_core 2.41.5
Pygments 2.19.2
PyYAML 6.0.3
pyzmq 27.1.0
ray 2.54.0
referencing 0.37.0
regex 2026.2.28
requests 2.32.5
rich 14.3.3
rpds-py 0.30.0
safetensors 0.7.0
sentencepiece 0.2.1
setuptools 65.5.0
shortuuid 1.0.13
sniffio 1.3.1
starlette 0.52.1
sympy 1.14.0
termcolor 3.3.0
tiktoken 0.12.0
tokenizers 0.22.2
tomli 2.4.0
torch 2.8.0+cu129
torchaudio 2.8.0+cu129
torchvision 0.23.0
tqdm 4.67.3
transformers 4.57.6
typing_extensions 4.15.0
typing-inspection 0.4.2
urllib3 2.6.3
uvicorn 0.41.0
xgrammar 0.1.32
yapf 0.43.0
yarl 1.23.0
Error traceback
Checklist
Describe the bug
Reproduction
lmdeploy serve api_server C:\Qwen3-30B-A3B-Instruct-2507-AWQ --model-name Qwen3 --server-port 1234 --api-key abcd+1234 --session-len 2048 --max-batch-size 1
Environment
Error traceback