Skip to content

[Bug]: Tensorizer deserialization converts complex64 tensors to bfloat16, causing assertion failure in Kimi-K2.5 ViT #206

@namaenande

Description

@namaenande

Issue Description

When using Tensorizer to serialize and then deserialize the Kimi-K2.5 model (vLLM 0.20.0+cu129), the deserialization step converts complex64 tensors (specifically rope_freqs_cis in the vision tower) into bfloat16. This leads to a shape/dtype assertion failure during the model’s forward pass, as the code explicitly expects torch.complex64.

Serialization completes without errors, but during deserialization a UserWarning is emitted:
Casting complex values to real discards the imaginary part
and later the model crashes with:
AssertionError: torch.bfloat16 (expected torch.complex64).


Environment

  • vLLM version: 0.20.0+cu129
  • Tensorizer version: 2.12.1
  • GPU: 8× NVIDIA H20
  • CUDA Version: 12.9
  • Driver Version: 570.86.15
  • Docker Image OS: Ubuntu 22.04.5 LTS (jammy)
  • Docker version: 26.1.4
  • Model: Kimi-K2.5

Steps to Reproduce

  1. Serialize the model using Tensorizer (successful):
python3 -m examples.others.tensorize_vllm_model \
  --model /ssd1/Kimi-K2.5 \
  --trust-remote-code \
  --tensor-parallel-size 8 \
  --mm-encoder-tp-mode data \
  --mm-processor-cache-type shm \
  --enable-chunked-prefill \
  --enable-prefix-caching \
  --max-num-batched-tokens 8192 \
  serialize \
  --serialized-directory /ssd1/harry/
  1. Launch vLLM server with Tensorizer deserialization (fails):
vllm serve \
  /ssd1/harry/Kimi-K2.5 \
  --served-model-name /model_files/h20/Kimi-K2.5 Kimi-K2.5 \
  --max-num-seqs 20 \
  --max-num-batched-tokens 8192 \
  --enable-chunked-prefill \
  --enable-prefix-caching \
  --host 0.0.0.0 \
  --port 8000 \
  --trust-remote-code \
  --tensor-parallel-size 8 \
  --tool-call-parser kimi_k2 \
  --reasoning-parser kimi_k2 \
  --enable-auto-tool-choice \
  --mm-encoder-tp-mode data \
  --mm-processor-cache-type shm \
  --gpu-memory-utilization 0.9 \
  --load-format tensorizer \
  --model-loader-extra-config '{"tensorizer_uri": "/ssd1/harry/Kimi-K2.5/model-rank-%03d.tensors"}'

Observed Behavior

  1. Tensorizer deserialization warning
    During model loading, Tensorizer complains about casting a complex tensor:

    (Worker_TP0 pid=13912) /usr/local/lib/python3.12/dist-packages/tensorizer/serialization.py:2720: UserWarning: Casting complex values to real discards the imaginary part (Triggered internally at /pytorch/aten/src/ATen/native/Copy.cpp:308.)
    (Worker_TP0 pid=13912)   tensor_on_device = tensor.to(device=self._device, dtype=target_dtype)
    
  2. Model crash during first forward pass
    When the vision encoder tries to apply RoPE using rope_freqs_cis, it checks the dtype and fails:

    (Worker_TP0 pid=13912) ERROR 04-30 02:27:18 [multiproc_executor.py:962] ...
    (Worker_TP0 pid=13912) ERROR 04-30 02:27:18 [multiproc_executor.py:962]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/kimi_k25_vit.py", line 46, in _apply_rope_input_validation
    (Worker_TP0 pid=13912) ERROR 04-30 02:27:18 [multiproc_executor.py:962]     assert freqs_cis.dtype == torch.complex64, freqs_cis.dtype
    (Worker_TP0 pid=13912) ERROR 04-30 02:27:18 [multiproc_executor.py:962] AssertionError: torch.bfloat16
    

    Full traceback attached below.


Expected Behavior

Model should deserialize without dtype warnings and run inference normally, with rope_freqs_cis preserving its original complex64 type.


Additional Context

AI generated

  • The failing tensor is rope_freqs_cis, a complex-valued frequency tensor used in rotary position embedding inside the vision tower (kimi_k25_vit.py).
  • The warning suggests that during deserialization, Tensorizer (or vLLM’s subsequent to(device, dtype=...) call) improperly casts it to a real floating-point type (bfloat16), losing the imaginary part.
  • Because the serialization step succeeded, the complex data is likely stored correctly in the .tensors file, but the deserialization logic does not respect the original complex dtype.
  • This may be related to Tensorizer’s target_dtype inference or vLLM’s model loading code that applies a uniform dtype conversion (e.g., torch.bfloat16) to all parameters without considering complex types.

Possible Workarounds

  1. Avoid Tensorizer and load the model directly from safetensors (as done during serialization).
  2. If vLLM supplies an option to control deserialization dtype, specify float32 or complex64 for the problematic parameters (not yet confirmed).

Relevant Logs

Click to expand full error trace
(Worker_TP0 pid=13912) /usr/local/lib/python3.12/dist-packages/tensorizer/serialization.py:2720: UserWarning: Casting complex values to real discards the imaginary part (Triggered internally at /pytorch/aten/src/ATen/native/Copy.cpp:308.)
(Worker_TP0 pid=13912)   tensor_on_device = tensor.to(device=self._device, dtype=target_dtype)
(Worker_TP0 pid=13912) INFO 04-30 02:26:58 [tensorizer.py:563] Deserialized 77.2 GB in 27.01s, 2.9 GB/s
(Worker_TP0 pid=13912) INFO 04-30 02:26:58 [tensorizer.py:566] Memory usage before: CPU: (maxrss: 4,037MiB F: 269,789MiB) GPU: (U: 79,598MiB F: 63,569MiB T: 143,167MiB) TORCH: (R: 75,532MiB/75,532MiB, A: 75,171MiB/75,171MiB)
(Worker_TP0 pid=13912) INFO 04-30 02:26:58 [tensorizer.py:567] Memory usage after: CPU: (maxrss: 14,477MiB F: 205,186MiB) GPU: (U: 123,344MiB F: 19,823MiB T: 143,167MiB) TORCH: (R: 119,150MiB/119,150MiB, A: 43,645MiB/118,567MiB)
(Worker_TP4 pid=13916) INFO 04-30 02:26:58 [tensorizer.py:563] Deserialized 77.2 GB in 27.34s, 2.8 GB/s
(Worker_TP4 pid=13916) INFO 04-30 02:26:58 [tensorizer.py:566] Memory usage before: CPU: (maxrss: 4,177MiB F: 269,078MiB) GPU: (U: 77,299MiB F: 65,868MiB T: 143,167MiB) TORCH: (R: 75,530MiB/75,530MiB, A: 75,171MiB/75,171MiB)
(Worker_TP4 pid=13916) INFO 04-30 02:26:58 [tensorizer.py:567] Memory usage after: CPU: (maxrss: 14,728MiB F: 204,404MiB) GPU: (U: 120,915MiB F: 22,252MiB T: 143,167MiB) TORCH: (R: 119,130MiB/119,130MiB, A: 43,627MiB/118,549MiB)
(Worker_TP0 pid=13912) INFO 04-30 02:26:59 [gpu_model_runner.py:4879] Model loading took 42.5 GiB memory and 29.754163 seconds
(Worker_TP1 pid=13913) INFO 04-30 02:27:05 [tensorizer.py:563] Deserialized 77.2 GB in 34.22s, 2.3 GB/s
(Worker_TP1 pid=13913) INFO 04-30 02:27:05 [tensorizer.py:566] Memory usage before: CPU: (maxrss: 4,173MiB F: 269,139MiB) GPU: (U: 77,299MiB F: 65,868MiB T: 143,167MiB) TORCH: (R: 75,530MiB/75,530MiB, A: 75,171MiB/75,171MiB)
(Worker_TP1 pid=13913) INFO 04-30 02:27:05 [tensorizer.py:567] Memory usage after: CPU: (maxrss: 14,726MiB F: 188,480MiB) GPU: (U: 120,915MiB F: 22,252MiB T: 143,167MiB) TORCH: (R: 119,130MiB/119,130MiB, A: 43,627MiB/118,549MiB)
(Worker_TP7 pid=13919) INFO 04-30 02:27:06 [tensorizer.py:563] Deserialized 77.2 GB in 34.69s, 2.2 GB/s
(Worker_TP7 pid=13919) INFO 04-30 02:27:06 [tensorizer.py:566] Memory usage before: CPU: (maxrss: 4,177MiB F: 269,258MiB) GPU: (U: 77,299MiB F: 65,868MiB T: 143,167MiB) TORCH: (R: 75,530MiB/75,530MiB, A: 75,171MiB/75,171MiB)
(Worker_TP7 pid=13919) INFO 04-30 02:27:06 [tensorizer.py:567] Memory usage after: CPU: (maxrss: 14,042MiB F: 187,690MiB) GPU: (U: 120,915MiB F: 22,252MiB T: 143,167MiB) TORCH: (R: 119,130MiB/119,130MiB, A: 43,627MiB/118,549MiB)
(Worker_TP3 pid=13915) INFO 04-30 02:27:10 [tensorizer.py:563] Deserialized 77.2 GB in 38.85s, 2.0 GB/s
(Worker_TP3 pid=13915) INFO 04-30 02:27:10 [tensorizer.py:566] Memory usage before: CPU: (maxrss: 4,177MiB F: 267,885MiB) GPU: (U: 77,299MiB F: 65,868MiB T: 143,167MiB) TORCH: (R: 75,530MiB/75,530MiB, A: 75,171MiB/75,171MiB)
(Worker_TP3 pid=13915) INFO 04-30 02:27:10 [tensorizer.py:567] Memory usage after: CPU: (maxrss: 14,150MiB F: 181,151MiB) GPU: (U: 120,915MiB F: 22,252MiB T: 143,167MiB) TORCH: (R: 119,130MiB/119,130MiB, A: 43,627MiB/118,549MiB)
(Worker_TP2 pid=13914) INFO 04-30 02:27:11 [tensorizer.py:563] Deserialized 77.2 GB in 40.27s, 1.9 GB/s
(Worker_TP2 pid=13914) INFO 04-30 02:27:11 [tensorizer.py:566] Memory usage before: CPU: (maxrss: 4,174MiB F: 269,153MiB) GPU: (U: 77,299MiB F: 65,868MiB T: 143,167MiB) TORCH: (R: 75,530MiB/75,530MiB, A: 75,171MiB/75,171MiB)
(Worker_TP2 pid=13914) INFO 04-30 02:27:11 [tensorizer.py:567] Memory usage after: CPU: (maxrss: 14,049MiB F: 180,212MiB) GPU: (U: 120,915MiB F: 22,252MiB T: 143,167MiB) TORCH: (R: 119,130MiB/119,130MiB, A: 43,627MiB/118,549MiB)
(Worker_TP5 pid=13917) INFO 04-30 02:27:15 [tensorizer.py:563] Deserialized 77.2 GB in 44.47s, 1.7 GB/s
(Worker_TP5 pid=13917) INFO 04-30 02:27:15 [tensorizer.py:566] Memory usage before: CPU: (maxrss: 4,176MiB F: 269,274MiB) GPU: (U: 77,299MiB F: 65,868MiB T: 143,167MiB) TORCH: (R: 75,530MiB/75,530MiB, A: 75,171MiB/75,171MiB)
(Worker_TP5 pid=13917) INFO 04-30 02:27:15 [tensorizer.py:567] Memory usage after: CPU: (maxrss: 14,388MiB F: 177,428MiB) GPU: (U: 120,915MiB F: 22,252MiB T: 143,167MiB) TORCH: (R: 119,130MiB/119,130MiB, A: 43,627MiB/118,549MiB)
(Worker_TP6 pid=13918) INFO 04-30 02:27:16 [tensorizer.py:563] Deserialized 77.2 GB in 44.52s, 1.7 GB/s
(Worker_TP6 pid=13918) INFO 04-30 02:27:16 [tensorizer.py:566] Memory usage before: CPU: (maxrss: 4,179MiB F: 269,313MiB) GPU: (U: 77,299MiB F: 65,868MiB T: 143,167MiB) TORCH: (R: 75,530MiB/75,530MiB, A: 75,171MiB/75,171MiB)
(Worker_TP6 pid=13918) INFO 04-30 02:27:16 [tensorizer.py:567] Memory usage after: CPU: (maxrss: 13,884MiB F: 177,426MiB) GPU: (U: 120,915MiB F: 22,252MiB T: 143,167MiB) TORCH: (R: 119,130MiB/119,130MiB, A: 43,627MiB/118,549MiB)
(Worker_TP0 pid=13912) INFO 04-30 02:27:17 [gpu_model_runner.py:5820] Encoder cache will be initialized with a budget of 8192 tokens, and profiled with 1 vision_chunk items of the maximum feature size.
(Worker_TP0 pid=13912) ERROR 04-30 02:27:18 [multiproc_executor.py:962] WorkerProc hit an exception.
(Worker_TP0 pid=13912) ERROR 04-30 02:27:18 [multiproc_executor.py:962] Traceback (most recent call last):
(Worker_TP0 pid=13912) ERROR 04-30 02:27:18 [multiproc_executor.py:962]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/multiproc_executor.py", line 957, in worker_busy_loop
(Worker_TP0 pid=13912) ERROR 04-30 02:27:18 [multiproc_executor.py:962]     output = func(*args, **kwargs)
(Worker_TP0 pid=13912) ERROR 04-30 02:27:18 [multiproc_executor.py:962]              ^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=13912) ERROR 04-30 02:27:18 [multiproc_executor.py:962]   File "/usr/local/lib/python3.12/dist-packages/torch/utils/_contextlib.py", line 124, in decorate_context
(Worker_TP0 pid=13912) ERROR 04-30 02:27:18 [multiproc_executor.py:962]     return func(*args, **kwargs)
(Worker_TP0 pid=13912) ERROR 04-30 02:27:18 [multiproc_executor.py:962]            ^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=13912) ERROR 04-30 02:27:18 [multiproc_executor.py:962]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_worker.py", line 370, in determine_available_memory
(Worker_TP0 pid=13912) ERROR 04-30 02:27:18 [multiproc_executor.py:962]     self.model_runner.profile_run()
(Worker_TP0 pid=13912) ERROR 04-30 02:27:18 [multiproc_executor.py:962]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_model_runner.py", line 5836, in profile_run
(Worker_TP0 pid=13912) ERROR 04-30 02:27:18 [multiproc_executor.py:962]     dummy_encoder_outputs = self.model.embed_multimodal(
(Worker_TP0 pid=13912) ERROR 04-30 02:27:18 [multiproc_executor.py:962]                             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=13912) ERROR 04-30 02:27:18 [multiproc_executor.py:962]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/kimi_k25.py", line 427, in embed_multimodal
(Worker_TP0 pid=13912) ERROR 04-30 02:27:18 [multiproc_executor.py:962]     vision_embeddings = self._process_media_input(media_input)
(Worker_TP0 pid=13912) ERROR 04-30 02:27:18 [multiproc_executor.py:962]                         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=13912) ERROR 04-30 02:27:18 [multiproc_executor.py:962]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/kimi_k25.py", line 411, in _process_media_input
(Worker_TP0 pid=13912) ERROR 04-30 02:27:18 [multiproc_executor.py:962]     media_features = vision_tower_forward(
(Worker_TP0 pid=13912) ERROR 04-30 02:27:18 [multiproc_executor.py:962]                      ^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=13912) ERROR 04-30 02:27:18 [multiproc_executor.py:962]   File "/usr/local/lib/python3.12/dist-packages/torch/utils/_contextlib.py", line 124, in decorate_context
(Worker_TP0 pid=13912) ERROR 04-30 02:27:18 [multiproc_executor.py:962]     return func(*args, **kwargs)
(Worker_TP0 pid=13912) ERROR 04-30 02:27:18 [multiproc_executor.py:962]            ^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=13912) ERROR 04-30 02:27:18 [multiproc_executor.py:962]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/kimi_k25_vit.py", line 642, in vision_tower_forward
(Worker_TP0 pid=13912) ERROR 04-30 02:27:18 [multiproc_executor.py:962]     vt_outputs = run_dp_sharded_mrope_vision_model(
(Worker_TP0 pid=13912) ERROR 04-30 02:27:18 [multiproc_executor.py:962]                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=13912) ERROR 04-30 02:27:18 [multiproc_executor.py:962]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/vision.py", line 475, in run_dp_sharded_mrope_vision_model
(Worker_TP0 pid=13912) ERROR 04-30 02:27:18 [multiproc_executor.py:962]     image_embeds_local = vision_model(
(Worker_TP0 pid=13912) ERROR 04-30 02:27:18 [multiproc_executor.py:962]                          ^^^^^^^^^^^^^
(Worker_TP0 pid=13912) ERROR 04-30 02:27:18 [multiproc_executor.py:962]   File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1779, in _wrapped_call_impl
(Worker_TP0 pid=13912) ERROR 04-30 02:27:18 [multiproc_executor.py:962]     return self._call_impl(*args, **kwargs)
(Worker_TP0 pid=13912) ERROR 04-30 02:27:18 [multiproc_executor.py:962]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=13912) ERROR 04-30 02:27:18 [multiproc_executor.py:962]   File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1790, in _call_impl
(Worker_TP0 pid=13912) ERROR 04-30 02:27:18 [multiproc_executor.py:962]     return forward_call(*args, **kwargs)
(Worker_TP0 pid=13912) ERROR 04-30 02:27:18 [multiproc_executor.py:962]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=13912) ERROR 04-30 02:27:18 [multiproc_executor.py:962]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/kimi_k25_vit.py", line 603, in forward
(Worker_TP0 pid=13912) ERROR 04-30 02:27:18 [multiproc_executor.py:962]     hidden_states = self.encoder(hidden_states, grid_thws)
(Worker_TP0 pid=13912) ERROR 04-30 02:27:18 [multiproc_executor.py:962]                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=13912) ERROR 04-30 02:27:18 [multiproc_executor.py:962]   File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1779, in _wrapped_call_impl
(Worker_TP0 pid=13912) ERROR 04-30 02:27:18 [multiproc_executor.py:962]     return self._call_impl(*args, **kwargs)
(Worker_TP0 pid=13912) ERROR 04-30 02:27:18 [multiproc_executor.py:962]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=13912) ERROR 04-30 02:27:18 [multiproc_executor.py:962]   File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1790, in _call_impl
(Worker_TP0 pid=13912) ERROR 04-30 02:27:18 [multiproc_executor.py:962]     return forward_call(*args, **kwargs)
(Worker_TP0 pid=13912) ERROR 04-30 02:27:18 [multiproc_executor.py:962]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=13912) ERROR 04-30 02:27:18 [multiproc_executor.py:962]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/kimi_k25_vit.py", line 515, in forward
(Worker_TP0 pid=13912) ERROR 04-30 02:27:18 [multiproc_executor.py:962]     hidden_states = block(
(Worker_TP0 pid=13912) ERROR 04-30 02:27:18 [multiproc_executor.py:962]                     ^^^^^^
(Worker_TP0 pid=13912) ERROR 04-30 02:27:18 [multiproc_executor.py:962]   File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1779, in _wrapped_call_impl
(Worker_TP0 pid=13912) ERROR 04-30 02:27:18 [multiproc_executor.py:962]     return self._call_impl(*args, **kwargs)
(Worker_TP0 pid=13912) ERROR 04-30 02:27:18 [multiproc_executor.py:962]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=13912) ERROR 04-30 02:27:18 [multiproc_executor.py:962]   File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1790, in _call_impl
(Worker_TP0 pid=13912) ERROR 04-30 02:27:18 [multiproc_executor.py:962]     return forward_call(*args, **kwargs)
(Worker_TP0 pid=13912) ERROR 04-30 02:27:18 [multiproc_executor.py:962]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=13912) ERROR 04-30 02:27:18 [multiproc_executor.py:962]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/kimi_k25_vit.py", line 450, in forward
(Worker_TP0 pid=13912) ERROR 04-30 02:27:18 [multiproc_executor.py:962]     hidden_states = self.attention_qkvpacked(
(Worker_TP0 pid=13912) ERROR 04-30 02:27:18 [multiproc_executor.py:962]                     ^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=13912) ERROR 04-30 02:27:18 [multiproc_executor.py:962]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/kimi_k25_vit.py", line 423, in attention_qkvpacked
(Worker_TP0 pid=13912) ERROR 04-30 02:27:18 [multiproc_executor.py:962]     xq, xk = apply_rope(xq, xk, rope_freqs_cis)
(Worker_TP0 pid=13912) ERROR 04-30 02:27:18 [multiproc_executor.py:962]              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=13912) ERROR 04-30 02:27:18 [multiproc_executor.py:962]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/kimi_k25_vit.py", line 88, in apply_rope
(Worker_TP0 pid=13912) ERROR 04-30 02:27:18 [multiproc_executor.py:962]     _apply_rope_input_validation(xq, freqs_cis)
(Worker_TP0 pid=13912) ERROR 04-30 02:27:18 [multiproc_executor.py:962]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/kimi_k25_vit.py", line 46, in _apply_rope_input_validation
(Worker_TP0 pid=13912) ERROR 04-30 02:27:18 [multiproc_executor.py:962]     assert freqs_cis.dtype == torch.complex64, freqs_cis.dtype
(Worker_TP0 pid=13912) ERROR 04-30 02:27:18 [multiproc_executor.py:962]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=13912) ERROR 04-30 02:27:18 [multiproc_executor.py:962] AssertionError: torch.bfloat16
(Worker_TP0 pid=13912) ERROR 04-30 02:27:18 [multiproc_executor.py:962] Traceback (most recent call last):
(Worker_TP0 pid=13912) ERROR 04-30 02:27:18 [multiproc_executor.py:962]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/multiproc_executor.py", line 957, in worker_busy_loop
(Worker_TP0 pid=13912) ERROR 04-30 02:27:18 [multiproc_executor.py:962]     output = func(*args, **kwargs)
(Worker_TP0 pid=13912) ERROR 04-30 02:27:18 [multiproc_executor.py:962]              ^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=13912) ERROR 04-30 02:27:18 [multiproc_executor.py:962]   File "/usr/local/lib/python3.12/dist-packages/torch/utils/_contextlib.py", line 124, in decorate_context
(Worker_TP0 pid=13912) ERROR 04-30 02:27:18 [multiproc_executor.py:962]     return func(*args, **kwargs)
(Worker_TP0 pid=13912) ERROR 04-30 02:27:18 [multiproc_executor.py:962]            ^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=13912) ERROR 04-30 02:27:18 [multiproc_executor.py:962]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_worker.py", line 370, in determine_available_memory
(Worker_TP0 pid=13912) ERROR 04-30 02:27:18 [multiproc_executor.py:962]     self.model_runner.profile_run()
(Worker_TP0 pid=13912) ERROR 04-30 02:27:18 [multiproc_executor.py:962]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_model_runner.py", line 5836, in profile_run
(Worker_TP0 pid=13912) ERROR 04-30 02:27:18 [multiproc_executor.py:962]     dummy_encoder_outputs = self.model.embed_multimodal(
(Worker_TP0 pid=13912) ERROR 04-30 02:27:18 [multiproc_executor.py:962]                             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=13912) ERROR 04-30 02:27:18 [multiproc_executor.py:962]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/kimi_k25.py", line 427, in embed_multimodal
(Worker_TP0 pid=13912) ERROR 04-30 02:27:18 [multiproc_executor.py:962]     vision_embeddings = self._process_media_input(media_input)
(Worker_TP0 pid=13912) ERROR 04-30 02:27:18 [multiproc_executor.py:962]                         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=13912) ERROR 04-30 02:27:18 [multiproc_executor.py:962]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/kimi_k25.py", line 411, in _process_media_input
(Worker_TP0 pid=13912) ERROR 04-30 02:27:18 [multiproc_executor.py:962]     media_features = vision_tower_forward(
(Worker_TP0 pid=13912) ERROR 04-30 02:27:18 [multiproc_executor.py:962]                      ^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=13912) ERROR 04-30 02:27:18 [multiproc_executor.py:962]   File "/usr/local/lib/python3.12/dist-packages/torch/utils/_contextlib.py", line 124, in decorate_context
(Worker_TP0 pid=13912) ERROR 04-30 02:27:18 [multiproc_executor.py:962]     return func(*args, **kwargs)
(Worker_TP0 pid=13912) ERROR 04-30 02:27:18 [multiproc_executor.py:962]            ^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=13912) ERROR 04-30 02:27:18 [multiproc_executor.py:962]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/kimi_k25_vit.py", line 642, in vision_tower_forward
(Worker_TP0 pid=13912) ERROR 04-30 02:27:18 [multiproc_executor.py:962]     vt_outputs = run_dp_sharded_mrope_vision_model(
(Worker_TP0 pid=13912) ERROR 04-30 02:27:18 [multiproc_executor.py:962]                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=13912) ERROR 04-30 02:27:18 [multiproc_executor.py:962]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/vision.py", line 475, in run_dp_sharded_mrope_vision_model
(Worker_TP0 pid=13912) ERROR 04-30 02:27:18 [multiproc_executor.py:962]     image_embeds_local = vision_model(
(Worker_TP0 pid=13912) ERROR 04-30 02:27:18 [multiproc_executor.py:962]                          ^^^^^^^^^^^^^
(Worker_TP0 pid=13912) ERROR 04-30 02:27:18 [multiproc_executor.py:962]   File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1779, in _wrapped_call_impl
(Worker_TP0 pid=13912) ERROR 04-30 02:27:18 [multiproc_executor.py:962]     return self._call_impl(*args, **kwargs)
(Worker_TP0 pid=13912) ERROR 04-30 02:27:18 [multiproc_executor.py:962]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=13912) ERROR 04-30 02:27:18 [multiproc_executor.py:962]   File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1790, in _call_impl
(Worker_TP0 pid=13912) ERROR 04-30 02:27:18 [multiproc_executor.py:962]     return forward_call(*args, **kwargs)
(Worker_TP0 pid=13912) ERROR 04-30 02:27:18 [multiproc_executor.py:962]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=13912) ERROR 04-30 02:27:18 [multiproc_executor.py:962]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/kimi_k25_vit.py", line 603, in forward
(Worker_TP0 pid=13912) ERROR 04-30 02:27:18 [multiproc_executor.py:962]     hidden_states = self.encoder(hidden_states, grid_thws)
(Worker_TP0 pid=13912) ERROR 04-30 02:27:18 [multiproc_executor.py:962]                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=13912) ERROR 04-30 02:27:18 [multiproc_executor.py:962]   File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1779, in _wrapped_call_impl
(Worker_TP0 pid=13912) ERROR 04-30 02:27:18 [multiproc_executor.py:962]     return self._call_impl(*args, **kwargs)
(Worker_TP0 pid=13912) ERROR 04-30 02:27:18 [multiproc_executor.py:962]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=13912) ERROR 04-30 02:27:18 [multiproc_executor.py:962]   File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1790, in _call_impl
(Worker_TP0 pid=13912) ERROR 04-30 02:27:18 [multiproc_executor.py:962]     return forward_call(*args, **kwargs)
(Worker_TP0 pid=13912) ERROR 04-30 02:27:18 [multiproc_executor.py:962]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=13912) ERROR 04-30 02:27:18 [multiproc_executor.py:962]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/kimi_k25_vit.py", line 515, in forward
(Worker_TP0 pid=13912) ERROR 04-30 02:27:18 [multiproc_executor.py:962]     hidden_states = block(
(Worker_TP0 pid=13912) ERROR 04-30 02:27:18 [multiproc_executor.py:962]                     ^^^^^^
(Worker_TP0 pid=13912) ERROR 04-30 02:27:18 [multiproc_executor.py:962]   File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1779, in _wrapped_call_impl
(Worker_TP0 pid=13912) ERROR 04-30 02:27:18 [multiproc_executor.py:962]     return self._call_impl(*args, **kwargs)
(Worker_TP0 pid=13912) ERROR 04-30 02:27:18 [multiproc_executor.py:962]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=13912) ERROR 04-30 02:27:18 [multiproc_executor.py:962]   File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1790, in _call_impl
(Worker_TP0 pid=13912) ERROR 04-30 02:27:18 [multiproc_executor.py:962]     return forward_call(*args, **kwargs)
(Worker_TP0 pid=13912) ERROR 04-30 02:27:18 [multiproc_executor.py:962]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=13912) ERROR 04-30 02:27:18 [multiproc_executor.py:962]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/kimi_k25_vit.py", line 450, in forward
(Worker_TP0 pid=13912) ERROR 04-30 02:27:18 [multiproc_executor.py:962]     hidden_states = self.attention_qkvpacked(
(Worker_TP0 pid=13912) ERROR 04-30 02:27:18 [multiproc_executor.py:962]                     ^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=13912) ERROR 04-30 02:27:18 [multiproc_executor.py:962]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/kimi_k25_vit.py", line 423, in attention_qkvpacked
(Worker_TP0 pid=13912) ERROR 04-30 02:27:18 [multiproc_executor.py:962]     xq, xk = apply_rope(xq, xk, rope_freqs_cis)
(Worker_TP0 pid=13912) ERROR 04-30 02:27:18 [multiproc_executor.py:962]              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=13912) ERROR 04-30 02:27:18 [multiproc_executor.py:962]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/kimi_k25_vit.py", line 88, in apply_rope
(Worker_TP0 pid=13912) ERROR 04-30 02:27:18 [multiproc_executor.py:962]     _apply_rope_input_validation(xq, freqs_cis)
(Worker_TP0 pid=13912) ERROR 04-30 02:27:18 [multiproc_executor.py:962]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/kimi_k25_vit.py", line 46, in _apply_rope_input_validation
(Worker_TP0 pid=13912) ERROR 04-30 02:27:18 [multiproc_executor.py:962]     assert freqs_cis.dtype == torch.complex64, freqs_cis.dtype
(Worker_TP0 pid=13912) ERROR 04-30 02:27:18 [multiproc_executor.py:962]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=13912) ERROR 04-30 02:27:18 [multiproc_executor.py:962] AssertionError: torch.bfloat16
(Worker_TP0 pid=13912) ERROR 04-30 02:27:18 [multiproc_executor.py:962] 
(EngineCore pid=13713) ERROR 04-30 02:27:18 [core.py:1136] EngineCore failed to start.
(EngineCore pid=13713) ERROR 04-30 02:27:18 [core.py:1136] Traceback (most recent call last):
(EngineCore pid=13713) ERROR 04-30 02:27:18 [core.py:1136]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 1110, in run_engine_core
(EngineCore pid=13713) ERROR 04-30 02:27:18 [core.py:1136]     engine_core = EngineCoreProc(*args, engine_index=dp_rank, **kwargs)
(EngineCore pid=13713) ERROR 04-30 02:27:18 [core.py:1136]                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=13713) ERROR 04-30 02:27:18 [core.py:1136]   File "/usr/local/lib/python3.12/dist-packages/vllm/tracing/otel.py", line 178, in sync_wrapper
(EngineCore pid=13713) ERROR 04-30 02:27:18 [core.py:1136]     return func(*args, **kwargs)
(EngineCore pid=13713) ERROR 04-30 02:27:18 [core.py:1136]            ^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=13713) ERROR 04-30 02:27:18 [core.py:1136]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 876, in __init__
(EngineCore pid=13713) ERROR 04-30 02:27:18 [core.py:1136]     super().__init__(
(EngineCore pid=13713) ERROR 04-30 02:27:18 [core.py:1136]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 128, in __init__
(EngineCore pid=13713) ERROR 04-30 02:27:18 [core.py:1136]     kv_cache_config = self._initialize_kv_caches(vllm_config)
(EngineCore pid=13713) ERROR 04-30 02:27:18 [core.py:1136]                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=13713) ERROR 04-30 02:27:18 [core.py:1136]   File "/usr/local/lib/python3.12/dist-packages/vllm/tracing/otel.py", line 178, in sync_wrapper
(EngineCore pid=13713) ERROR 04-30 02:27:18 [core.py:1136]     return func(*args, **kwargs)
(EngineCore pid=13713) ERROR 04-30 02:27:18 [core.py:1136]            ^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=13713) ERROR 04-30 02:27:18 [core.py:1136]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 250, in _initialize_kv_caches
(EngineCore pid=13713) ERROR 04-30 02:27:18 [core.py:1136]     available_gpu_memory = self.model_executor.determine_available_memory()
(EngineCore pid=13713) ERROR 04-30 02:27:18 [core.py:1136]                            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=13713) ERROR 04-30 02:27:18 [core.py:1136]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/abstract.py", line 147, in determine_available_memory
(EngineCore pid=13713) ERROR 04-30 02:27:18 [core.py:1136]     return self.collective_rpc("determine_available_memory")
(EngineCore pid=13713) ERROR 04-30 02:27:18 [core.py:1136]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=13713) ERROR 04-30 02:27:18 [core.py:1136]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/multiproc_executor.py", line 403, in collective_rpc
(EngineCore pid=13713) ERROR 04-30 02:27:18 [core.py:1136]     return future if non_block else future.result()
(EngineCore pid=13713) ERROR 04-30 02:27:18 [core.py:1136]                                     ^^^^^^^^^^^^^^^
(EngineCore pid=13713) ERROR 04-30 02:27:18 [core.py:1136]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/multiproc_executor.py", line 90, in result
(EngineCore pid=13713) ERROR 04-30 02:27:18 [core.py:1136]     return super().result()
(EngineCore pid=13713) ERROR 04-30 02:27:18 [core.py:1136]            ^^^^^^^^^^^^^^^^
(EngineCore pid=13713) ERROR 04-30 02:27:18 [core.py:1136]   File "/usr/lib/python3.12/concurrent/futures/_base.py", line 449, in result
(EngineCore pid=13713) ERROR 04-30 02:27:18 [core.py:1136]     return self.__get_result()
(EngineCore pid=13713) ERROR 04-30 02:27:18 [core.py:1136]            ^^^^^^^^^^^^^^^^^^^
(EngineCore pid=13713) ERROR 04-30 02:27:18 [core.py:1136]   File "/usr/lib/python3.12/concurrent/futures/_base.py", line 401, in __get_result
(EngineCore pid=13713) ERROR 04-30 02:27:18 [core.py:1136]     raise self._exception
(EngineCore pid=13713) ERROR 04-30 02:27:18 [core.py:1136]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/multiproc_executor.py", line 94, in _wait_for_response
(EngineCore pid=13713) ERROR 04-30 02:27:18 [core.py:1136]     response = self.aggregate(self.get_response())
(EngineCore pid=13713) ERROR 04-30 02:27:18 [core.py:1136]                               ^^^^^^^^^^^^^^^^^^^
(EngineCore pid=13713) ERROR 04-30 02:27:18 [core.py:1136]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/multiproc_executor.py", line 390, in get_response
(EngineCore pid=13713) ERROR 04-30 02:27:18 [core.py:1136]     raise RuntimeError(
(EngineCore pid=13713) ERROR 04-30 02:27:18 [core.py:1136] RuntimeError: Worker failed with error 'torch.bfloat16', please check the stack trace above for the root cause

Suggested Investigation

  • Tensorizer side: Check whether tensorizer’s target_dtype configuration or device‑transfer logic explicitly checks for complex dtypes and skips casting.

Metadata

Metadata

Assignees

Labels

bugSomething isn't working

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions