-
Notifications
You must be signed in to change notification settings - Fork 31
Open
Description
我的命令:
vllm serve TeleChat2-7B \
--trust-remote-code \
--max-model-len 2000 \
--tensor-parallel-size 2 \
--dtype float16 --port 10000运行之后,会一直卡在一步,不继续加载模型:
INFO 11-27 02:16:22 api_server.py:495] vLLM API server version 0.6.1.post2
INFO 11-27 02:16:22 api_server.py:496] args: Namespace(model_tag='TeleChat2-7B', config='', host=None, port=10000,
......
INFO 11-27 02:16:26 custom_cache_manager.py:17] Setting Triton cache manager to: vllm.triton_utils.custom_cache_manager:CustomCacheManager
INFO 11-27 02:16:26 selector.py:217] Cannot use FlashAttention-2 backend for Volta and Turing GPUs.
INFO 11-27 02:16:26 selector.py:116] Using XFormers backend.
(VllmWorkerProcess pid=23694) INFO 11-27 02:16:26 selector.py:217] Cannot use FlashAttention-2 backend for Volta and Turing GPUs.
(VllmWorkerProcess pid=23694) INFO 11-27 02:16:26 selector.py:116] Using XFormers backend.
/opt/conda/envs/telechat/lib/python3.11/site-packages/xformers/ops/fmha/flash.py:211: FutureWarning: `torch.library.impl_abstract` was renamed to `torch.library.register_fake`. Please use that instead; we will remove `torch.library.impl_abstract` in a future version of PyTorch.
@torch.library.impl_abstract("xformers_flash::flash_fwd")
/opt/conda/envs/telechat/lib/python3.11/site-packages/xformers/ops/fmha/flash.py:344: FutureWarning: `torch.library.impl_abstract` was renamed to `torch.library.register_fake`. Please use that instead; we will remove `torch.library.impl_abstract` in a future version of PyTorch.
@torch.library.impl_abstract("xformers_flash::flash_bwd")
(VllmWorkerProcess pid=23694) /opt/conda/envs/telechat/lib/python3.11/site-packages/xformers/ops/fmha/flash.py:211: FutureWarning: `torch.library.impl_abstract` was renamed to `torch.library.register_fake`. Please use that instead; we will remove `torch.library.impl_abstract` in a future version of PyTorch.
(VllmWorkerProcess pid=23694) @torch.library.impl_abstract("xformers_flash::flash_fwd")
(VllmWorkerProcess pid=23694) /opt/conda/envs/telechat/lib/python3.11/site-packages/xformers/ops/fmha/flash.py:344: FutureWarning: `torch.library.impl_abstract` was renamed to `torch.library.register_fake`. Please use that instead; we will remove `torch.library.impl_abstract` in a future version of PyTorch.
(VllmWorkerProcess pid=23694) @torch.library.impl_abstract("xformers_flash::flash_bwd")
(VllmWorkerProcess pid=23694) INFO 11-27 02:16:27 multiproc_worker_utils.py:215] Worker ready; awaiting tasks
(VllmWorkerProcess pid=23694) INFO 11-27 02:16:27 utils.py:981] Found nccl from library libnccl.so.2
INFO 11-27 02:16:27 utils.py:981] Found nccl from library libnccl.so.2
(VllmWorkerProcess pid=23694) INFO 11-27 02:16:27 pynccl.py:63] vLLM is using nccl==2.20.5
INFO 11-27 02:16:27 pynccl.py:63] vLLM is using nccl==2.20.5 <------- 一直卡在这一步 不往下进行了目前不知道是什么问题,想知道我的参数有没有错误,或者是哪里的配置未改。
ps. 单卡加载推理的话是正常的
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels
