Skip to content

[Bug] 96G显存RTX6000显卡/windows系统/用turbomind推理引擎运行Qwen3-30B-A3B-AWQ模型,报错,显存超出 #4401

@somkh

Description

@somkh

Checklist

  • 1. I have searched related issues but cannot get the expected help.
  • 2. The bug has not been fixed in the latest version.
  • 3. Please note that if the bug-related issue you submitted lacks corresponding environment info and a minimal reproducible demo, it will be challenging for us to reproduce and resolve the issue, reducing the likelihood of receiving feedback.

Describe the bug

Image 96G显存RTX6000显卡/windows系统/用turbomind推理引擎运行Qwen3-30B-A3B-AWQ模型,报错,显存超出。

Reproduction

lmdeploy serve api_server C:\Qwen3-30B-A3B-Instruct-2507-AWQ --model-name Qwen3 --server-port 1234 --api-key abcd+1234 --session-len 2048 --max-batch-size 1

Environment

windows10+python10+cuda12.9
Package                   Version
------------------------- -------------
accelerate                1.13.0
addict                    2.4.0
aiohappyeyeballs          2.6.1
aiohttp                   3.13.3
aiosignal                 1.4.0
annotated-doc             0.0.4
annotated-types           0.7.0
anyio                     4.12.1
async-timeout             5.0.1
attrs                     25.4.0
certifi                   2026.2.25
charset-normalizer        3.4.5
click                     8.3.1
cloudpickle               3.1.2
colorama                  0.4.6
distro                    1.9.0
einops                    0.8.2
exceptiongroup            1.3.1
fastapi                   0.135.1
filelock                  3.25.1
fire                      0.7.1
frozenlist                1.8.0
fsspec                    2026.2.0
h11                       0.16.0
httpcore                  1.0.9
httpx                     0.28.1
huggingface_hub           0.36.2
idna                      3.11
Jinja2                    3.1.6
jiter                     0.13.0
jsonschema                4.26.0
jsonschema-specifications 2025.9.1
lmdeploy                  0.12.1
markdown-it-py            4.0.0
MarkupSafe                3.0.3
mdurl                     0.1.2
mmengine-lite             0.10.7
mpmath                    1.3.0
msgpack                   1.1.2
multidict                 6.7.1
networkx                  3.4.2
numpy                     2.2.6
openai                    2.26.0
openai-harmony            0.0.8
packaging                 26.0
partial-json-parser       0.2.1.1.post7
peft                      0.14.0
pillow                    12.1.1
pip                       26.0.1
platformdirs              4.9.4
prometheus_client         0.24.1
propcache                 0.4.1
protobuf                  7.34.0
psutil                    7.2.2
pybase64                  1.4.3
pydantic                  2.12.5
pydantic_core             2.41.5
Pygments                  2.19.2
PyYAML                    6.0.3
pyzmq                     27.1.0
ray                       2.54.0
referencing               0.37.0
regex                     2026.2.28
requests                  2.32.5
rich                      14.3.3
rpds-py                   0.30.0
safetensors               0.7.0
sentencepiece             0.2.1
setuptools                65.5.0
shortuuid                 1.0.13
sniffio                   1.3.1
starlette                 0.52.1
sympy                     1.14.0
termcolor                 3.3.0
tiktoken                  0.12.0
tokenizers                0.22.2
tomli                     2.4.0
torch                     2.8.0+cu129
torchaudio                2.8.0+cu129
torchvision               0.23.0
tqdm                      4.67.3
transformers              4.57.6
typing_extensions         4.15.0
typing-inspection         0.4.2
urllib3                   2.6.3
uvicorn                   0.41.0
xgrammar                  0.1.32
yapf                      0.43.0
yarl                      1.23.0

Error traceback

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions