While trying to bring up Minimax 2.7 on single node 8 GPU instance of B40, the process is hanging. This minimax 2.5 earlier worked in the same env,
(APIServer pid=1) DEBUG 04-12 21:38:28 [v1/engine/utils.py:1143] Waiting for 1 local, 0 remote core engine proc(s) to start.
(APIServer pid=1) DEBUG 04-12 21:38:38 [v1/engine/utils.py:1143] Waiting for 1 local, 0 remote core engine proc(s) to start.
(APIServer pid=1) DEBUG 04-12 21:38:48 [v1/engine/utils.py:1143] Waiting for 1 local, 0 remote core engine proc(s) to start.
(APIServer pid=1) DEBUG 04-12 21:38:58 [v1/engine/utils.py:1143] Waiting for 1 local, 0 remote core engine proc(s) to start.
(APIServer pid=1) DEBUG 04-12 21:39:08 [v1/engine/utils.py:1143] Waiting for 1 local, 0 remote core engine proc(s) to start.
(APIServer pid=1) DEBUG 04-12 21:39:18 [v1/engine/utils.py:1143] Waiting for 1 local, 0 remote core engine proc(s) to start.
This is from kubernetes yaml.
containers:
- name: vllm-container
image: vllm/vllm-openai:nightly
env:
- name: VLLM_LOGGING_LEVEL
value: "DEBUG" # Shows detailed model loading progress
- name: NCCL_DEBUG
value: "INFO" # "TRACE" is very noisy, "INFO" is usually enough
- name: VLLM_WORKER_MULTIPROC_METHOD
value: "fork"
args: [
"MiniMaxAI/MiniMax-M2.7",
"--tensor-parallel-size", "8",
"--enable-expert-parallel",
"--tool-call-parser", "minimax_m2",
"--reasoning-parser", "minimax_m2_append_think",
"--enable-auto-tool-choice",
"--compilation-config", '{"mode":1,"pass_config":{"fuse_minimax_qk_norm":true}}' ,
"--trust-remote-code",
"--compilation-config", '{"cudagraph_mode": "PIECEWISE"}',
"--disable-custom-all-reduce",
"--enforce-eager"
]
I have tried the instructions here - https://docs.vllm.ai/projects/recipes/en/latest/MiniMax/MiniMax-M2.html#using-docker
All of them seems to hang in the above steps. Any hints.
While trying to bring up Minimax 2.7 on single node 8 GPU instance of B40, the process is hanging. This minimax 2.5 earlier worked in the same env,
(APIServer pid=1) DEBUG 04-12 21:38:28 [v1/engine/utils.py:1143] Waiting for 1 local, 0 remote core engine proc(s) to start.
(APIServer pid=1) DEBUG 04-12 21:38:38 [v1/engine/utils.py:1143] Waiting for 1 local, 0 remote core engine proc(s) to start.
(APIServer pid=1) DEBUG 04-12 21:38:48 [v1/engine/utils.py:1143] Waiting for 1 local, 0 remote core engine proc(s) to start.
(APIServer pid=1) DEBUG 04-12 21:38:58 [v1/engine/utils.py:1143] Waiting for 1 local, 0 remote core engine proc(s) to start.
(APIServer pid=1) DEBUG 04-12 21:39:08 [v1/engine/utils.py:1143] Waiting for 1 local, 0 remote core engine proc(s) to start.
(APIServer pid=1) DEBUG 04-12 21:39:18 [v1/engine/utils.py:1143] Waiting for 1 local, 0 remote core engine proc(s) to start.
This is from kubernetes yaml.
I have tried the instructions here - https://docs.vllm.ai/projects/recipes/en/latest/MiniMax/MiniMax-M2.html#using-docker
All of them seems to hang in the above steps. Any hints.