[Bug] lmdeploy_dlinfer/ascend API Server流式输出问题

### Checklist

- [x] 1. I have searched related issues but cannot get the expected help.
- [x] 2. The bug has not been fixed in the latest version.
- [x] 3. Please note that if the bug-related issue you submitted lacks corresponding environment info and a minimal reproducible demo, it will be challenging for us to reproduce and resolve the issue, reducing the likelihood of receiving feedback.

### Describe the bug

使用lmdeploy_dlinfer/ascend:a2-latest image在910B上运行api server时，不支持stream output. lmdeploy serve api_server --help也没有看到是否启用流式的相关设置参数。

### Reproduction

按照文档方式在NPU使用容器启动api_server
```
docker run -it --net=host crpi-4crprmm5baj1v8iv.cn-hangzhou.personal.cr.aliyuncs.com/lmdeploy_dlinfer/ascend:a2-latest \
bash -i -c "lmdeploy serve api_server --backend pytorch --device ascend qwen/qwen3-0.6b --server-port 40001  --model-name qwen3-0.6b"
```

使用以下命令进行测试
```
curl -N -X POST http://localhost:40001/v1/chat/completions   \
       -H "Content-Type: application/json"   \
       -d '{ 
              "model": "qwen3-0.6b",
              "messages": [{"role": "user", "content": "介绍一下你自己"}],
              "stream": true
           }'
```
尽管设置了 stream=true，内容仍然是生成完成后一次性返回，而非流式输出。

相同测试在vllm或者openmmlab/lmdeploy:v0.10.2-cu12 image下正常

### Environment

```Shell
root@HW-Ascend-1723723:/# lmdeploy check_env
sys.platform: linux
Python: 3.11.13 (main, Nov 20 2025, 16:02:27) [GCC 11.4.0]
CUDA available: False
MUSA available: False
numpy_random_seed: 2147483648
GCC: gcc (Ubuntu 11.4.0-1ubuntu1~22.04.2) 11.4.0
PyTorch: 2.8.0+cpu
PyTorch compiling details: PyTorch built with:
  - GCC 13.3
  - C++ Version: 201703
  - Intel(R) MKL-DNN v3.7.1 (Git Hash 8d263e693366ef8db40acc569cc7d8edf644556d)
  - OpenMP 201511 (a.k.a. OpenMP 4.5)
  - LAPACK is enabled (usually provided by MKL)
  - NNPACK is enabled
  - CPU capability usage: DEFAULT
  - Build settings: BLAS_INFO=open, BUILD_TYPE=Release, COMMIT_SHA=a1cb3cc05d46d198467bebbb6e8fba50a325d4e7, CXX_COMPILER=/opt/rh/gcc-toolset-13/root/usr/bin/c++, CXX_FLAGS=-ffunction-sections -fdata-sections -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -DNDEBUG -DUSE_KINETO -DLIBKINETO_NOCUPTI -DLIBKINETO_NOROCTRACER -DLIBKINETO_NOXPUPTI=ON -DUSE_PYTORCH_QNNPACK -DAT_BUILD_ARM_VEC256_WITH_SLEEF -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -O2 -fPIC -DC10_NODEPRECATED -Wall -Wextra -Werror=return-type -Werror=non-virtual-dtor -Werror=range-loop-construct -Werror=bool-operation -Wnarrowing -Wno-missing-field-initializers -Wno-unknown-pragmas -Wno-unused-parameter -Wno-strict-overflow -Wno-strict-aliasing -Wno-stringop-overflow -Wsuggest-override -Wno-psabi -Wno-error=old-style-cast -faligned-new -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Wno-dangling-reference -Wno-error=dangling-reference -Wno-stringop-overflow, LAPACK_INFO=open, TORCH_VERSION=2.8.0, USE_CUDA=OFF, USE_CUDNN=OFF, USE_CUSPARSELT=OFF, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_GLOO=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=OFF, USE_NNPACK=ON, USE_OPENMP=ON, USE_ROCM=OFF, USE_ROCM_KERNEL_ASSERT=OFF, USE_XCCL=OFF, USE_XPU=OFF, 

TorchVision: 0.23.0
LMDeploy: 0.11.0+
transformers: 4.57.3
fastapi: 0.123.8
pydantic: 2.12.5
triton: Not Found
```

### Error traceback

```Shell

```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Bug] lmdeploy_dlinfer/ascend API Server流式输出问题 #4209

Checklist

Describe the bug

Reproduction

Environment

Error traceback

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Bug] lmdeploy_dlinfer/ascend API Server流式输出问题 #4209

Description

Checklist

Describe the bug

Reproduction

Environment

Error traceback

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions