(Worker_TP0 pid=13912) /usr/local/lib/python3.12/dist-packages/tensorizer/serialization.py:2720: UserWarning: Casting complex values to real discards the imaginary part (Triggered internally at /pytorch/aten/src/ATen/native/Copy.cpp:308.)
(Worker_TP0 pid=13912) tensor_on_device = tensor.to(device=self._device, dtype=target_dtype)
(Worker_TP0 pid=13912) INFO 04-30 02:26:58 [tensorizer.py:563] Deserialized 77.2 GB in 27.01s, 2.9 GB/s
(Worker_TP0 pid=13912) INFO 04-30 02:26:58 [tensorizer.py:566] Memory usage before: CPU: (maxrss: 4,037MiB F: 269,789MiB) GPU: (U: 79,598MiB F: 63,569MiB T: 143,167MiB) TORCH: (R: 75,532MiB/75,532MiB, A: 75,171MiB/75,171MiB)
(Worker_TP0 pid=13912) INFO 04-30 02:26:58 [tensorizer.py:567] Memory usage after: CPU: (maxrss: 14,477MiB F: 205,186MiB) GPU: (U: 123,344MiB F: 19,823MiB T: 143,167MiB) TORCH: (R: 119,150MiB/119,150MiB, A: 43,645MiB/118,567MiB)
(Worker_TP4 pid=13916) INFO 04-30 02:26:58 [tensorizer.py:563] Deserialized 77.2 GB in 27.34s, 2.8 GB/s
(Worker_TP4 pid=13916) INFO 04-30 02:26:58 [tensorizer.py:566] Memory usage before: CPU: (maxrss: 4,177MiB F: 269,078MiB) GPU: (U: 77,299MiB F: 65,868MiB T: 143,167MiB) TORCH: (R: 75,530MiB/75,530MiB, A: 75,171MiB/75,171MiB)
(Worker_TP4 pid=13916) INFO 04-30 02:26:58 [tensorizer.py:567] Memory usage after: CPU: (maxrss: 14,728MiB F: 204,404MiB) GPU: (U: 120,915MiB F: 22,252MiB T: 143,167MiB) TORCH: (R: 119,130MiB/119,130MiB, A: 43,627MiB/118,549MiB)
(Worker_TP0 pid=13912) INFO 04-30 02:26:59 [gpu_model_runner.py:4879] Model loading took 42.5 GiB memory and 29.754163 seconds
(Worker_TP1 pid=13913) INFO 04-30 02:27:05 [tensorizer.py:563] Deserialized 77.2 GB in 34.22s, 2.3 GB/s
(Worker_TP1 pid=13913) INFO 04-30 02:27:05 [tensorizer.py:566] Memory usage before: CPU: (maxrss: 4,173MiB F: 269,139MiB) GPU: (U: 77,299MiB F: 65,868MiB T: 143,167MiB) TORCH: (R: 75,530MiB/75,530MiB, A: 75,171MiB/75,171MiB)
(Worker_TP1 pid=13913) INFO 04-30 02:27:05 [tensorizer.py:567] Memory usage after: CPU: (maxrss: 14,726MiB F: 188,480MiB) GPU: (U: 120,915MiB F: 22,252MiB T: 143,167MiB) TORCH: (R: 119,130MiB/119,130MiB, A: 43,627MiB/118,549MiB)
(Worker_TP7 pid=13919) INFO 04-30 02:27:06 [tensorizer.py:563] Deserialized 77.2 GB in 34.69s, 2.2 GB/s
(Worker_TP7 pid=13919) INFO 04-30 02:27:06 [tensorizer.py:566] Memory usage before: CPU: (maxrss: 4,177MiB F: 269,258MiB) GPU: (U: 77,299MiB F: 65,868MiB T: 143,167MiB) TORCH: (R: 75,530MiB/75,530MiB, A: 75,171MiB/75,171MiB)
(Worker_TP7 pid=13919) INFO 04-30 02:27:06 [tensorizer.py:567] Memory usage after: CPU: (maxrss: 14,042MiB F: 187,690MiB) GPU: (U: 120,915MiB F: 22,252MiB T: 143,167MiB) TORCH: (R: 119,130MiB/119,130MiB, A: 43,627MiB/118,549MiB)
(Worker_TP3 pid=13915) INFO 04-30 02:27:10 [tensorizer.py:563] Deserialized 77.2 GB in 38.85s, 2.0 GB/s
(Worker_TP3 pid=13915) INFO 04-30 02:27:10 [tensorizer.py:566] Memory usage before: CPU: (maxrss: 4,177MiB F: 267,885MiB) GPU: (U: 77,299MiB F: 65,868MiB T: 143,167MiB) TORCH: (R: 75,530MiB/75,530MiB, A: 75,171MiB/75,171MiB)
(Worker_TP3 pid=13915) INFO 04-30 02:27:10 [tensorizer.py:567] Memory usage after: CPU: (maxrss: 14,150MiB F: 181,151MiB) GPU: (U: 120,915MiB F: 22,252MiB T: 143,167MiB) TORCH: (R: 119,130MiB/119,130MiB, A: 43,627MiB/118,549MiB)
(Worker_TP2 pid=13914) INFO 04-30 02:27:11 [tensorizer.py:563] Deserialized 77.2 GB in 40.27s, 1.9 GB/s
(Worker_TP2 pid=13914) INFO 04-30 02:27:11 [tensorizer.py:566] Memory usage before: CPU: (maxrss: 4,174MiB F: 269,153MiB) GPU: (U: 77,299MiB F: 65,868MiB T: 143,167MiB) TORCH: (R: 75,530MiB/75,530MiB, A: 75,171MiB/75,171MiB)
(Worker_TP2 pid=13914) INFO 04-30 02:27:11 [tensorizer.py:567] Memory usage after: CPU: (maxrss: 14,049MiB F: 180,212MiB) GPU: (U: 120,915MiB F: 22,252MiB T: 143,167MiB) TORCH: (R: 119,130MiB/119,130MiB, A: 43,627MiB/118,549MiB)
(Worker_TP5 pid=13917) INFO 04-30 02:27:15 [tensorizer.py:563] Deserialized 77.2 GB in 44.47s, 1.7 GB/s
(Worker_TP5 pid=13917) INFO 04-30 02:27:15 [tensorizer.py:566] Memory usage before: CPU: (maxrss: 4,176MiB F: 269,274MiB) GPU: (U: 77,299MiB F: 65,868MiB T: 143,167MiB) TORCH: (R: 75,530MiB/75,530MiB, A: 75,171MiB/75,171MiB)
(Worker_TP5 pid=13917) INFO 04-30 02:27:15 [tensorizer.py:567] Memory usage after: CPU: (maxrss: 14,388MiB F: 177,428MiB) GPU: (U: 120,915MiB F: 22,252MiB T: 143,167MiB) TORCH: (R: 119,130MiB/119,130MiB, A: 43,627MiB/118,549MiB)
(Worker_TP6 pid=13918) INFO 04-30 02:27:16 [tensorizer.py:563] Deserialized 77.2 GB in 44.52s, 1.7 GB/s
(Worker_TP6 pid=13918) INFO 04-30 02:27:16 [tensorizer.py:566] Memory usage before: CPU: (maxrss: 4,179MiB F: 269,313MiB) GPU: (U: 77,299MiB F: 65,868MiB T: 143,167MiB) TORCH: (R: 75,530MiB/75,530MiB, A: 75,171MiB/75,171MiB)
(Worker_TP6 pid=13918) INFO 04-30 02:27:16 [tensorizer.py:567] Memory usage after: CPU: (maxrss: 13,884MiB F: 177,426MiB) GPU: (U: 120,915MiB F: 22,252MiB T: 143,167MiB) TORCH: (R: 119,130MiB/119,130MiB, A: 43,627MiB/118,549MiB)
(Worker_TP0 pid=13912) INFO 04-30 02:27:17 [gpu_model_runner.py:5820] Encoder cache will be initialized with a budget of 8192 tokens, and profiled with 1 vision_chunk items of the maximum feature size.
(Worker_TP0 pid=13912) ERROR 04-30 02:27:18 [multiproc_executor.py:962] WorkerProc hit an exception.
(Worker_TP0 pid=13912) ERROR 04-30 02:27:18 [multiproc_executor.py:962] Traceback (most recent call last):
(Worker_TP0 pid=13912) ERROR 04-30 02:27:18 [multiproc_executor.py:962] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/multiproc_executor.py", line 957, in worker_busy_loop
(Worker_TP0 pid=13912) ERROR 04-30 02:27:18 [multiproc_executor.py:962] output = func(*args, **kwargs)
(Worker_TP0 pid=13912) ERROR 04-30 02:27:18 [multiproc_executor.py:962] ^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=13912) ERROR 04-30 02:27:18 [multiproc_executor.py:962] File "/usr/local/lib/python3.12/dist-packages/torch/utils/_contextlib.py", line 124, in decorate_context
(Worker_TP0 pid=13912) ERROR 04-30 02:27:18 [multiproc_executor.py:962] return func(*args, **kwargs)
(Worker_TP0 pid=13912) ERROR 04-30 02:27:18 [multiproc_executor.py:962] ^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=13912) ERROR 04-30 02:27:18 [multiproc_executor.py:962] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_worker.py", line 370, in determine_available_memory
(Worker_TP0 pid=13912) ERROR 04-30 02:27:18 [multiproc_executor.py:962] self.model_runner.profile_run()
(Worker_TP0 pid=13912) ERROR 04-30 02:27:18 [multiproc_executor.py:962] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_model_runner.py", line 5836, in profile_run
(Worker_TP0 pid=13912) ERROR 04-30 02:27:18 [multiproc_executor.py:962] dummy_encoder_outputs = self.model.embed_multimodal(
(Worker_TP0 pid=13912) ERROR 04-30 02:27:18 [multiproc_executor.py:962] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=13912) ERROR 04-30 02:27:18 [multiproc_executor.py:962] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/kimi_k25.py", line 427, in embed_multimodal
(Worker_TP0 pid=13912) ERROR 04-30 02:27:18 [multiproc_executor.py:962] vision_embeddings = self._process_media_input(media_input)
(Worker_TP0 pid=13912) ERROR 04-30 02:27:18 [multiproc_executor.py:962] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=13912) ERROR 04-30 02:27:18 [multiproc_executor.py:962] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/kimi_k25.py", line 411, in _process_media_input
(Worker_TP0 pid=13912) ERROR 04-30 02:27:18 [multiproc_executor.py:962] media_features = vision_tower_forward(
(Worker_TP0 pid=13912) ERROR 04-30 02:27:18 [multiproc_executor.py:962] ^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=13912) ERROR 04-30 02:27:18 [multiproc_executor.py:962] File "/usr/local/lib/python3.12/dist-packages/torch/utils/_contextlib.py", line 124, in decorate_context
(Worker_TP0 pid=13912) ERROR 04-30 02:27:18 [multiproc_executor.py:962] return func(*args, **kwargs)
(Worker_TP0 pid=13912) ERROR 04-30 02:27:18 [multiproc_executor.py:962] ^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=13912) ERROR 04-30 02:27:18 [multiproc_executor.py:962] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/kimi_k25_vit.py", line 642, in vision_tower_forward
(Worker_TP0 pid=13912) ERROR 04-30 02:27:18 [multiproc_executor.py:962] vt_outputs = run_dp_sharded_mrope_vision_model(
(Worker_TP0 pid=13912) ERROR 04-30 02:27:18 [multiproc_executor.py:962] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=13912) ERROR 04-30 02:27:18 [multiproc_executor.py:962] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/vision.py", line 475, in run_dp_sharded_mrope_vision_model
(Worker_TP0 pid=13912) ERROR 04-30 02:27:18 [multiproc_executor.py:962] image_embeds_local = vision_model(
(Worker_TP0 pid=13912) ERROR 04-30 02:27:18 [multiproc_executor.py:962] ^^^^^^^^^^^^^
(Worker_TP0 pid=13912) ERROR 04-30 02:27:18 [multiproc_executor.py:962] File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1779, in _wrapped_call_impl
(Worker_TP0 pid=13912) ERROR 04-30 02:27:18 [multiproc_executor.py:962] return self._call_impl(*args, **kwargs)
(Worker_TP0 pid=13912) ERROR 04-30 02:27:18 [multiproc_executor.py:962] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=13912) ERROR 04-30 02:27:18 [multiproc_executor.py:962] File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1790, in _call_impl
(Worker_TP0 pid=13912) ERROR 04-30 02:27:18 [multiproc_executor.py:962] return forward_call(*args, **kwargs)
(Worker_TP0 pid=13912) ERROR 04-30 02:27:18 [multiproc_executor.py:962] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=13912) ERROR 04-30 02:27:18 [multiproc_executor.py:962] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/kimi_k25_vit.py", line 603, in forward
(Worker_TP0 pid=13912) ERROR 04-30 02:27:18 [multiproc_executor.py:962] hidden_states = self.encoder(hidden_states, grid_thws)
(Worker_TP0 pid=13912) ERROR 04-30 02:27:18 [multiproc_executor.py:962] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=13912) ERROR 04-30 02:27:18 [multiproc_executor.py:962] File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1779, in _wrapped_call_impl
(Worker_TP0 pid=13912) ERROR 04-30 02:27:18 [multiproc_executor.py:962] return self._call_impl(*args, **kwargs)
(Worker_TP0 pid=13912) ERROR 04-30 02:27:18 [multiproc_executor.py:962] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=13912) ERROR 04-30 02:27:18 [multiproc_executor.py:962] File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1790, in _call_impl
(Worker_TP0 pid=13912) ERROR 04-30 02:27:18 [multiproc_executor.py:962] return forward_call(*args, **kwargs)
(Worker_TP0 pid=13912) ERROR 04-30 02:27:18 [multiproc_executor.py:962] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=13912) ERROR 04-30 02:27:18 [multiproc_executor.py:962] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/kimi_k25_vit.py", line 515, in forward
(Worker_TP0 pid=13912) ERROR 04-30 02:27:18 [multiproc_executor.py:962] hidden_states = block(
(Worker_TP0 pid=13912) ERROR 04-30 02:27:18 [multiproc_executor.py:962] ^^^^^^
(Worker_TP0 pid=13912) ERROR 04-30 02:27:18 [multiproc_executor.py:962] File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1779, in _wrapped_call_impl
(Worker_TP0 pid=13912) ERROR 04-30 02:27:18 [multiproc_executor.py:962] return self._call_impl(*args, **kwargs)
(Worker_TP0 pid=13912) ERROR 04-30 02:27:18 [multiproc_executor.py:962] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=13912) ERROR 04-30 02:27:18 [multiproc_executor.py:962] File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1790, in _call_impl
(Worker_TP0 pid=13912) ERROR 04-30 02:27:18 [multiproc_executor.py:962] return forward_call(*args, **kwargs)
(Worker_TP0 pid=13912) ERROR 04-30 02:27:18 [multiproc_executor.py:962] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=13912) ERROR 04-30 02:27:18 [multiproc_executor.py:962] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/kimi_k25_vit.py", line 450, in forward
(Worker_TP0 pid=13912) ERROR 04-30 02:27:18 [multiproc_executor.py:962] hidden_states = self.attention_qkvpacked(
(Worker_TP0 pid=13912) ERROR 04-30 02:27:18 [multiproc_executor.py:962] ^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=13912) ERROR 04-30 02:27:18 [multiproc_executor.py:962] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/kimi_k25_vit.py", line 423, in attention_qkvpacked
(Worker_TP0 pid=13912) ERROR 04-30 02:27:18 [multiproc_executor.py:962] xq, xk = apply_rope(xq, xk, rope_freqs_cis)
(Worker_TP0 pid=13912) ERROR 04-30 02:27:18 [multiproc_executor.py:962] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=13912) ERROR 04-30 02:27:18 [multiproc_executor.py:962] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/kimi_k25_vit.py", line 88, in apply_rope
(Worker_TP0 pid=13912) ERROR 04-30 02:27:18 [multiproc_executor.py:962] _apply_rope_input_validation(xq, freqs_cis)
(Worker_TP0 pid=13912) ERROR 04-30 02:27:18 [multiproc_executor.py:962] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/kimi_k25_vit.py", line 46, in _apply_rope_input_validation
(Worker_TP0 pid=13912) ERROR 04-30 02:27:18 [multiproc_executor.py:962] assert freqs_cis.dtype == torch.complex64, freqs_cis.dtype
(Worker_TP0 pid=13912) ERROR 04-30 02:27:18 [multiproc_executor.py:962] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=13912) ERROR 04-30 02:27:18 [multiproc_executor.py:962] AssertionError: torch.bfloat16
(Worker_TP0 pid=13912) ERROR 04-30 02:27:18 [multiproc_executor.py:962] Traceback (most recent call last):
(Worker_TP0 pid=13912) ERROR 04-30 02:27:18 [multiproc_executor.py:962] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/multiproc_executor.py", line 957, in worker_busy_loop
(Worker_TP0 pid=13912) ERROR 04-30 02:27:18 [multiproc_executor.py:962] output = func(*args, **kwargs)
(Worker_TP0 pid=13912) ERROR 04-30 02:27:18 [multiproc_executor.py:962] ^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=13912) ERROR 04-30 02:27:18 [multiproc_executor.py:962] File "/usr/local/lib/python3.12/dist-packages/torch/utils/_contextlib.py", line 124, in decorate_context
(Worker_TP0 pid=13912) ERROR 04-30 02:27:18 [multiproc_executor.py:962] return func(*args, **kwargs)
(Worker_TP0 pid=13912) ERROR 04-30 02:27:18 [multiproc_executor.py:962] ^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=13912) ERROR 04-30 02:27:18 [multiproc_executor.py:962] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_worker.py", line 370, in determine_available_memory
(Worker_TP0 pid=13912) ERROR 04-30 02:27:18 [multiproc_executor.py:962] self.model_runner.profile_run()
(Worker_TP0 pid=13912) ERROR 04-30 02:27:18 [multiproc_executor.py:962] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_model_runner.py", line 5836, in profile_run
(Worker_TP0 pid=13912) ERROR 04-30 02:27:18 [multiproc_executor.py:962] dummy_encoder_outputs = self.model.embed_multimodal(
(Worker_TP0 pid=13912) ERROR 04-30 02:27:18 [multiproc_executor.py:962] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=13912) ERROR 04-30 02:27:18 [multiproc_executor.py:962] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/kimi_k25.py", line 427, in embed_multimodal
(Worker_TP0 pid=13912) ERROR 04-30 02:27:18 [multiproc_executor.py:962] vision_embeddings = self._process_media_input(media_input)
(Worker_TP0 pid=13912) ERROR 04-30 02:27:18 [multiproc_executor.py:962] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=13912) ERROR 04-30 02:27:18 [multiproc_executor.py:962] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/kimi_k25.py", line 411, in _process_media_input
(Worker_TP0 pid=13912) ERROR 04-30 02:27:18 [multiproc_executor.py:962] media_features = vision_tower_forward(
(Worker_TP0 pid=13912) ERROR 04-30 02:27:18 [multiproc_executor.py:962] ^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=13912) ERROR 04-30 02:27:18 [multiproc_executor.py:962] File "/usr/local/lib/python3.12/dist-packages/torch/utils/_contextlib.py", line 124, in decorate_context
(Worker_TP0 pid=13912) ERROR 04-30 02:27:18 [multiproc_executor.py:962] return func(*args, **kwargs)
(Worker_TP0 pid=13912) ERROR 04-30 02:27:18 [multiproc_executor.py:962] ^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=13912) ERROR 04-30 02:27:18 [multiproc_executor.py:962] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/kimi_k25_vit.py", line 642, in vision_tower_forward
(Worker_TP0 pid=13912) ERROR 04-30 02:27:18 [multiproc_executor.py:962] vt_outputs = run_dp_sharded_mrope_vision_model(
(Worker_TP0 pid=13912) ERROR 04-30 02:27:18 [multiproc_executor.py:962] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=13912) ERROR 04-30 02:27:18 [multiproc_executor.py:962] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/vision.py", line 475, in run_dp_sharded_mrope_vision_model
(Worker_TP0 pid=13912) ERROR 04-30 02:27:18 [multiproc_executor.py:962] image_embeds_local = vision_model(
(Worker_TP0 pid=13912) ERROR 04-30 02:27:18 [multiproc_executor.py:962] ^^^^^^^^^^^^^
(Worker_TP0 pid=13912) ERROR 04-30 02:27:18 [multiproc_executor.py:962] File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1779, in _wrapped_call_impl
(Worker_TP0 pid=13912) ERROR 04-30 02:27:18 [multiproc_executor.py:962] return self._call_impl(*args, **kwargs)
(Worker_TP0 pid=13912) ERROR 04-30 02:27:18 [multiproc_executor.py:962] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=13912) ERROR 04-30 02:27:18 [multiproc_executor.py:962] File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1790, in _call_impl
(Worker_TP0 pid=13912) ERROR 04-30 02:27:18 [multiproc_executor.py:962] return forward_call(*args, **kwargs)
(Worker_TP0 pid=13912) ERROR 04-30 02:27:18 [multiproc_executor.py:962] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=13912) ERROR 04-30 02:27:18 [multiproc_executor.py:962] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/kimi_k25_vit.py", line 603, in forward
(Worker_TP0 pid=13912) ERROR 04-30 02:27:18 [multiproc_executor.py:962] hidden_states = self.encoder(hidden_states, grid_thws)
(Worker_TP0 pid=13912) ERROR 04-30 02:27:18 [multiproc_executor.py:962] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=13912) ERROR 04-30 02:27:18 [multiproc_executor.py:962] File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1779, in _wrapped_call_impl
(Worker_TP0 pid=13912) ERROR 04-30 02:27:18 [multiproc_executor.py:962] return self._call_impl(*args, **kwargs)
(Worker_TP0 pid=13912) ERROR 04-30 02:27:18 [multiproc_executor.py:962] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=13912) ERROR 04-30 02:27:18 [multiproc_executor.py:962] File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1790, in _call_impl
(Worker_TP0 pid=13912) ERROR 04-30 02:27:18 [multiproc_executor.py:962] return forward_call(*args, **kwargs)
(Worker_TP0 pid=13912) ERROR 04-30 02:27:18 [multiproc_executor.py:962] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=13912) ERROR 04-30 02:27:18 [multiproc_executor.py:962] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/kimi_k25_vit.py", line 515, in forward
(Worker_TP0 pid=13912) ERROR 04-30 02:27:18 [multiproc_executor.py:962] hidden_states = block(
(Worker_TP0 pid=13912) ERROR 04-30 02:27:18 [multiproc_executor.py:962] ^^^^^^
(Worker_TP0 pid=13912) ERROR 04-30 02:27:18 [multiproc_executor.py:962] File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1779, in _wrapped_call_impl
(Worker_TP0 pid=13912) ERROR 04-30 02:27:18 [multiproc_executor.py:962] return self._call_impl(*args, **kwargs)
(Worker_TP0 pid=13912) ERROR 04-30 02:27:18 [multiproc_executor.py:962] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=13912) ERROR 04-30 02:27:18 [multiproc_executor.py:962] File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1790, in _call_impl
(Worker_TP0 pid=13912) ERROR 04-30 02:27:18 [multiproc_executor.py:962] return forward_call(*args, **kwargs)
(Worker_TP0 pid=13912) ERROR 04-30 02:27:18 [multiproc_executor.py:962] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=13912) ERROR 04-30 02:27:18 [multiproc_executor.py:962] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/kimi_k25_vit.py", line 450, in forward
(Worker_TP0 pid=13912) ERROR 04-30 02:27:18 [multiproc_executor.py:962] hidden_states = self.attention_qkvpacked(
(Worker_TP0 pid=13912) ERROR 04-30 02:27:18 [multiproc_executor.py:962] ^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=13912) ERROR 04-30 02:27:18 [multiproc_executor.py:962] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/kimi_k25_vit.py", line 423, in attention_qkvpacked
(Worker_TP0 pid=13912) ERROR 04-30 02:27:18 [multiproc_executor.py:962] xq, xk = apply_rope(xq, xk, rope_freqs_cis)
(Worker_TP0 pid=13912) ERROR 04-30 02:27:18 [multiproc_executor.py:962] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=13912) ERROR 04-30 02:27:18 [multiproc_executor.py:962] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/kimi_k25_vit.py", line 88, in apply_rope
(Worker_TP0 pid=13912) ERROR 04-30 02:27:18 [multiproc_executor.py:962] _apply_rope_input_validation(xq, freqs_cis)
(Worker_TP0 pid=13912) ERROR 04-30 02:27:18 [multiproc_executor.py:962] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/kimi_k25_vit.py", line 46, in _apply_rope_input_validation
(Worker_TP0 pid=13912) ERROR 04-30 02:27:18 [multiproc_executor.py:962] assert freqs_cis.dtype == torch.complex64, freqs_cis.dtype
(Worker_TP0 pid=13912) ERROR 04-30 02:27:18 [multiproc_executor.py:962] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=13912) ERROR 04-30 02:27:18 [multiproc_executor.py:962] AssertionError: torch.bfloat16
(Worker_TP0 pid=13912) ERROR 04-30 02:27:18 [multiproc_executor.py:962]
(EngineCore pid=13713) ERROR 04-30 02:27:18 [core.py:1136] EngineCore failed to start.
(EngineCore pid=13713) ERROR 04-30 02:27:18 [core.py:1136] Traceback (most recent call last):
(EngineCore pid=13713) ERROR 04-30 02:27:18 [core.py:1136] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 1110, in run_engine_core
(EngineCore pid=13713) ERROR 04-30 02:27:18 [core.py:1136] engine_core = EngineCoreProc(*args, engine_index=dp_rank, **kwargs)
(EngineCore pid=13713) ERROR 04-30 02:27:18 [core.py:1136] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=13713) ERROR 04-30 02:27:18 [core.py:1136] File "/usr/local/lib/python3.12/dist-packages/vllm/tracing/otel.py", line 178, in sync_wrapper
(EngineCore pid=13713) ERROR 04-30 02:27:18 [core.py:1136] return func(*args, **kwargs)
(EngineCore pid=13713) ERROR 04-30 02:27:18 [core.py:1136] ^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=13713) ERROR 04-30 02:27:18 [core.py:1136] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 876, in __init__
(EngineCore pid=13713) ERROR 04-30 02:27:18 [core.py:1136] super().__init__(
(EngineCore pid=13713) ERROR 04-30 02:27:18 [core.py:1136] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 128, in __init__
(EngineCore pid=13713) ERROR 04-30 02:27:18 [core.py:1136] kv_cache_config = self._initialize_kv_caches(vllm_config)
(EngineCore pid=13713) ERROR 04-30 02:27:18 [core.py:1136] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=13713) ERROR 04-30 02:27:18 [core.py:1136] File "/usr/local/lib/python3.12/dist-packages/vllm/tracing/otel.py", line 178, in sync_wrapper
(EngineCore pid=13713) ERROR 04-30 02:27:18 [core.py:1136] return func(*args, **kwargs)
(EngineCore pid=13713) ERROR 04-30 02:27:18 [core.py:1136] ^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=13713) ERROR 04-30 02:27:18 [core.py:1136] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 250, in _initialize_kv_caches
(EngineCore pid=13713) ERROR 04-30 02:27:18 [core.py:1136] available_gpu_memory = self.model_executor.determine_available_memory()
(EngineCore pid=13713) ERROR 04-30 02:27:18 [core.py:1136] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=13713) ERROR 04-30 02:27:18 [core.py:1136] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/abstract.py", line 147, in determine_available_memory
(EngineCore pid=13713) ERROR 04-30 02:27:18 [core.py:1136] return self.collective_rpc("determine_available_memory")
(EngineCore pid=13713) ERROR 04-30 02:27:18 [core.py:1136] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=13713) ERROR 04-30 02:27:18 [core.py:1136] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/multiproc_executor.py", line 403, in collective_rpc
(EngineCore pid=13713) ERROR 04-30 02:27:18 [core.py:1136] return future if non_block else future.result()
(EngineCore pid=13713) ERROR 04-30 02:27:18 [core.py:1136] ^^^^^^^^^^^^^^^
(EngineCore pid=13713) ERROR 04-30 02:27:18 [core.py:1136] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/multiproc_executor.py", line 90, in result
(EngineCore pid=13713) ERROR 04-30 02:27:18 [core.py:1136] return super().result()
(EngineCore pid=13713) ERROR 04-30 02:27:18 [core.py:1136] ^^^^^^^^^^^^^^^^
(EngineCore pid=13713) ERROR 04-30 02:27:18 [core.py:1136] File "/usr/lib/python3.12/concurrent/futures/_base.py", line 449, in result
(EngineCore pid=13713) ERROR 04-30 02:27:18 [core.py:1136] return self.__get_result()
(EngineCore pid=13713) ERROR 04-30 02:27:18 [core.py:1136] ^^^^^^^^^^^^^^^^^^^
(EngineCore pid=13713) ERROR 04-30 02:27:18 [core.py:1136] File "/usr/lib/python3.12/concurrent/futures/_base.py", line 401, in __get_result
(EngineCore pid=13713) ERROR 04-30 02:27:18 [core.py:1136] raise self._exception
(EngineCore pid=13713) ERROR 04-30 02:27:18 [core.py:1136] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/multiproc_executor.py", line 94, in _wait_for_response
(EngineCore pid=13713) ERROR 04-30 02:27:18 [core.py:1136] response = self.aggregate(self.get_response())
(EngineCore pid=13713) ERROR 04-30 02:27:18 [core.py:1136] ^^^^^^^^^^^^^^^^^^^
(EngineCore pid=13713) ERROR 04-30 02:27:18 [core.py:1136] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/multiproc_executor.py", line 390, in get_response
(EngineCore pid=13713) ERROR 04-30 02:27:18 [core.py:1136] raise RuntimeError(
(EngineCore pid=13713) ERROR 04-30 02:27:18 [core.py:1136] RuntimeError: Worker failed with error 'torch.bfloat16', please check the stack trace above for the root cause
Issue Description
When using Tensorizer to serialize and then deserialize the Kimi-K2.5 model (vLLM 0.20.0+cu129), the deserialization step converts complex64 tensors (specifically
rope_freqs_cisin the vision tower) intobfloat16. This leads to a shape/dtype assertion failure during the model’s forward pass, as the code explicitly expectstorch.complex64.Serialization completes without errors, but during deserialization a
UserWarningis emitted:Casting complex values to real discards the imaginary partand later the model crashes with:
AssertionError: torch.bfloat16(expectedtorch.complex64).Environment
0.20.0+cu1292.12.1jammy)Steps to Reproduce
vllm serve \ /ssd1/harry/Kimi-K2.5 \ --served-model-name /model_files/h20/Kimi-K2.5 Kimi-K2.5 \ --max-num-seqs 20 \ --max-num-batched-tokens 8192 \ --enable-chunked-prefill \ --enable-prefix-caching \ --host 0.0.0.0 \ --port 8000 \ --trust-remote-code \ --tensor-parallel-size 8 \ --tool-call-parser kimi_k2 \ --reasoning-parser kimi_k2 \ --enable-auto-tool-choice \ --mm-encoder-tp-mode data \ --mm-processor-cache-type shm \ --gpu-memory-utilization 0.9 \ --load-format tensorizer \ --model-loader-extra-config '{"tensorizer_uri": "/ssd1/harry/Kimi-K2.5/model-rank-%03d.tensors"}'Observed Behavior
Tensorizer deserialization warning
During model loading, Tensorizer complains about casting a complex tensor:
Model crash during first forward pass
When the vision encoder tries to apply RoPE using
rope_freqs_cis, it checks the dtype and fails:Full traceback attached below.
Expected Behavior
Model should deserialize without dtype warnings and run inference normally, with
rope_freqs_cispreserving its originalcomplex64type.Additional Context
AI generated
rope_freqs_cis, a complex-valued frequency tensor used in rotary position embedding inside the vision tower (kimi_k25_vit.py).to(device, dtype=...)call) improperly casts it to a real floating-point type (bfloat16), losing the imaginary part..tensorsfile, but the deserialization logic does not respect the original complex dtype.target_dtypeinference or vLLM’s model loading code that applies a uniform dtype conversion (e.g.,torch.bfloat16) to all parameters without considering complex types.Possible Workarounds
float32orcomplex64for the problematic parameters (not yet confirmed).Relevant Logs
Click to expand full error trace
Suggested Investigation
tensorizer’starget_dtypeconfiguration or device‑transfer logic explicitly checks for complex dtypes and skips casting.