-
Notifications
You must be signed in to change notification settings - Fork 641
Description
Motivation
Recently, I migrated several model serving services from vllm/sglang to lmdeploy. By the way, lmdeploy’s turbomind engine shows significantly better performance on V100 (Volta GPU)—thank you for your excellent work.
I encountered one issue: sglang is quite lenient with model names in requests, whereas lmdeploy is very strict and doesn’t support multiple model names (vllm allows configuring multiple aliases).
In my case, some lazy downstream clients interchangeably use names like Qwen3-32B or qwen3-32b to call the model, which results in a 404 error for the latter.
I would like lmdeploy to support multiple aliases for a model, or at least allow a default alias pointing to the primary model.
For example:
lmdeploy serve api_server ./Qwen3-32B-gptqmodel-4bit --model-name Qwen3-32B qwen3-32b default
As a quick and temporary fix, I have implemented a simple patch as shown below.
--- a/data/miniforge3/envs/lmdeploy_0.11.x/lib/python3.12/site-packages/lmdeploy/serve/openai/api_server.py.orig
+++ b/data/miniforge3/envs/lmdeploy_0.11.x/lib/python3.12/site-packages/lmdeploy/serve/openai/api_server.py
@@ -131,8 +131,24 @@ def create_error_response(status: HTTPStatus, message: str, error_type='invalid_
def check_request(request) -> Optional[JSONResponse]:
"""Check if a request is valid."""
- if hasattr(request, 'model') and request.model not in get_model_list():
- return create_error_response(HTTPStatus.NOT_FOUND, f'The model {request.model!r} does not exist.')
+ #if hasattr(request, 'model') and request.model not in get_model_list():
+ # return create_error_response(HTTPStatus.NOT_FOUND, f'The model {request.model!r} does not exist.')
+ if hasattr(request, 'model') and request.model and isinstance(request.model, str):
+ available = get_model_list()
+ req_model = request.model
+ # 支持 "default" -> 选第一个模型
+ if req_model.lower() == 'default':
+ request.model = available[0]
+ else:
+ # 简单循环匹配(大小写敏感优先,其次大小写不敏感)
+ for m in available:
+ if req_model == m:
+ break
+ if req_model.lower() == m.lower():
+ request.model = m
+ break
+ else:
+ return create_error_response(HTTPStatus.NOT_FOUND, f'The model {request.model!r} does not exist.')
# Import the appropriate check function based on request type
if isinstance(request, ChatCompletionRequest):
Related resources
No response
Additional context
No response