Skip to content

[Feature] made api_server model-name case insensitive in request #4225

@bash99

Description

@bash99

Motivation

Recently, I migrated several model serving services from vllm/sglang to lmdeploy. By the way, lmdeploy’s turbomind engine shows significantly better performance on V100 (Volta GPU)—thank you for your excellent work.

I encountered one issue: sglang is quite lenient with model names in requests, whereas lmdeploy is very strict and doesn’t support multiple model names (vllm allows configuring multiple aliases).
In my case, some lazy downstream clients interchangeably use names like Qwen3-32B or qwen3-32b to call the model, which results in a 404 error for the latter.

I would like lmdeploy to support multiple aliases for a model, or at least allow a default alias pointing to the primary model.
For example:

lmdeploy serve api_server ./Qwen3-32B-gptqmodel-4bit --model-name Qwen3-32B qwen3-32b default

As a quick and temporary fix, I have implemented a simple patch as shown below.

--- a/data/miniforge3/envs/lmdeploy_0.11.x/lib/python3.12/site-packages/lmdeploy/serve/openai/api_server.py.orig
+++ b/data/miniforge3/envs/lmdeploy_0.11.x/lib/python3.12/site-packages/lmdeploy/serve/openai/api_server.py
@@ -131,8 +131,24 @@ def create_error_response(status: HTTPStatus, message: str, error_type='invalid_

 def check_request(request) -> Optional[JSONResponse]:
     """Check if a request is valid."""
-    if hasattr(request, 'model') and request.model not in get_model_list():
-        return create_error_response(HTTPStatus.NOT_FOUND, f'The model {request.model!r} does not exist.')
+    #if hasattr(request, 'model') and request.model not in get_model_list():
+    #    return create_error_response(HTTPStatus.NOT_FOUND, f'The model {request.model!r} does not exist.')
+    if hasattr(request, 'model') and request.model and isinstance(request.model, str):
+        available = get_model_list()
+        req_model = request.model
+        # 支持 "default" -> 选第一个模型
+        if req_model.lower() == 'default':
+            request.model = available[0]
+        else:
+            # 简单循环匹配(大小写敏感优先,其次大小写不敏感)
+            for m in available:
+                if req_model == m:
+                    break
+                if req_model.lower() == m.lower():
+                    request.model = m
+                    break
+            else:
+                return create_error_response(HTTPStatus.NOT_FOUND, f'The model {request.model!r} does not exist.')

     # Import the appropriate check function based on request type
     if isinstance(request, ChatCompletionRequest):

Related resources

No response

Additional context

No response

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions