-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Open
Labels
bugSomething isn't workingSomething isn't working
Description
System Info
PyTorch version: 2.9.1+cpu
Is debug build: False
CUDA used to build PyTorch: None
ROCM used to build PyTorch: N/A
OS: Debian GNU/Linux 13 (trixie) (x86_64)
GCC version: (Debian 14.2.0-19) 14.2.0
Clang version: Could not collect
CMake version: Could not collect
Libc version: glibc-2.41
Python version: 3.12.12 (main, Dec 8 2025, 23:38:42) [GCC 14.2.0] (64-bit runtime)
Python platform: Linux-6.15.10-200.fc42.aarch64-x86_64-with-glibc2.41
Is CUDA available: False
CUDA runtime version: No CUDA
CUDA_MODULE_LOADING set to: N/A
GPU models and configuration: No CUDA
Nvidia driver version: No CUDA
cuDNN version: No CUDA
Is XPU available: False
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True
CPU:
Architecture: x86_64
CPU op-mode(s): 32-bit
Byte Order: Little Endian
CPU(s): 8
On-line CPU(s) list: 0-7
Vendor ID: 0x61
Model name: -
Model: 0
Thread(s) per core: 1
Core(s) per socket: 8
Socket(s): 1
Stepping: 0x0
BogoMIPS: 48.00
Flags: fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 asimddp sha512 asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp flagm2 frint bf16 bti afp
NUMA node(s): 1
NUMA node0 CPU(s): 0-7
Vulnerability Gather data sampling: Not affected
Vulnerability Ghostwrite: Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit: Not affected
Vulnerability L1tf: Not affected
Vulnerability Mds: Not affected
Vulnerability Meltdown: Not affected
Vulnerability Mmio stale data: Not affected
Vulnerability Reg file data sampling: Not affected
Vulnerability Retbleed: Not affected
Vulnerability Spec rstack overflow: Not affected
Vulnerability Spec store bypass: Vulnerable
Vulnerability Spectre v1: Mitigation; __user pointer sanitization
Vulnerability Spectre v2: Mitigation; CSV2, but not BHB
Vulnerability Srbds: Not affected
Vulnerability Tsa: Not affected
Vulnerability Tsx async abort: Not affected
Versions of relevant libraries:
[pip3] mypy_extensions==1.1.0
[pip3] numpy==2.3.5
[pip3] torch==2.9.1+cpu
[pip3] torchao==0.14.1+cpu
[pip3] torchtune==0.5.0+cpu
[pip3] torchvision==0.24.1+cpu
[conda] Could not collect
Information
- The official example scripts
- My own modified scripts
🐛 Describe the bug
Running in podman with:
podman run -d \
--name $LLAMASTACK_CONTAINER \
--network $NETWORK_NAME \
-e VLLM_API_TOKEN="my-token" \
-e VLLM_URL="https://my-vllm-server/v1" \
-v llamastack-data:/.llama:z \
docker.io/llamastack/distribution-starter:0.3.5"
my remote vllm hosts 2 models: Qwen3 for inference and nomic-embed-text for embedding. Hitting /v1/models results in:
{
"data": [
{
"id": "nomic-embed-text-v1-5",
"object": "model",
"created": 1677610602,
"owned_by": "openai"
},
{
"id": "qwen3-14b-gaudi",
"object": "model",
"created": 1677610602,
"owned_by": "openai"
}
],
"object": "list"
}When I try to create a vector store with:
vector_store = client.vector_stores.create(
name=vector_store_id,
extra_body={
"embedding_model": "vllm/nomic-embed-text-v1-5",
"embedding_dimension": 768
}
}I get an error (see Error logs section). It thinks that the embedding model is an llm. I believe this is because the vLLM provider doesn't have a hard-coded "common" list of embedding models ('embedding_model_metadata` in the provider source). So it thinks that all models served by vLLM are LLMs and fails to use them as embedding models.
Error logs
ERROR 2025-12-23 15:44:20,028 llama_stack.core.server.server:290 core::server: Error executing endpoint
route='/v1/vector_stores' method='post'
╭───────────────────────────────────── Traceback (most recent call last) ─────────────────────────────────────╮
} │ /usr/local/lib/python3.12/site-packages/llama_stack/core/server/server.py:280 in route_handler │
} │ │
� │ 277 │ │ │ │ │ return StreamingResponse(gen, media_type="text/event-stream") │
� │ 278 │ │ │ │ else: │
� │ 279 │ │ │ │ │ value = func(**kwargs) │
� │ ❱ 280 │ │ │ │ │ result = await maybe_await(value) │
� │ 281 │ │ │ │ │ if isinstance(result, PaginatedResponse) and result.url is None: │
� │ 282 │ │ │ │ │ │ result.url = route │
} │ 283 │
} │ │
} │ /usr/local/lib/python3.12/site-packages/llama_stack/core/server/server.py:202 in maybe_await │
} │ │
} │ 199 │
} │ 200 async def maybe_await(value): │
│ 201 │ if inspect.iscoroutine(value): │
� │ ❱ 202 │ │ return await value │
│ 203 │ return value │
} │ 204 │
} │ 205 │
} │ │
} │ /usr/local/lib/python3.12/site-packages/llama_stack/providers/utils/telemetry/trace_protocol.py:101 in │
} │ async_wrapper │
} │ │
� │ 98 │ │ │ │
� │ 99 │ │ │ with tracing.span(f"{class_name}.{method_name}", span_attributes) as span: │
� │ 100 │ │ │ │ try: │
� │ ❱ 101 │ │ │ │ │ result = await method(self, *args, **kwargs) │
� │ 102 │ │ │ │ │ span.set_attribute("output", serialize_value(result)) │
� │ 103 │ │ │ │ │ return result │
� │ 104 │ │ │ │ except Exception as e: │
} │ │
} │ /usr/local/lib/python3.12/site-packages/llama_stack/core/routers/vector_io.py:150 in │
} │ openai_create_vector_store │
} │ │
� │ 147 │ │ │ │ provider_id = list(self.routing_table.impls_by_provider_id.keys())[0] │
� │ 148 │ │ │
� │ 149 │ │ vector_store_id = f"vs_{uuid.uuid4()}" │
� │ ❱ 150 │ │ registered_vector_store = await self.routing_table.register_vector_store( │
� │ 151 │ │ │ vector_store_id=vector_store_id, │
� │ 152 │ │ │ embedding_model=embedding_model, │
� │ 153 │ │ │ embedding_dimension=embedding_dimension, │
} │ │
} │ /usr/local/lib/python3.12/site-packages/llama_stack/core/routing_tables/vector_stores.py:66 in │
} │ register_vector_store │
} │ │
� │ 63 │ │ if model is None: │
� │ 64 │ │ │ raise ModelNotFoundError(embedding_model) │
� │ 65 │ │ if model.model_type != ModelType.embedding: │
� │ ❱ 66 │ │ │ raise ModelTypeError(embedding_model, model.model_type, ModelType.embedding) │
� │ 67 │ │ │
� │ 68 │ │ vector_store = VectorStoreWithOwner( │
� │ 69 │ │ │ identifier=vector_store_id, │
W ╰─────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
ModelTypeError: Model 'vllm/nomic-embed-text-v1-5' is of type 'llm' rather than the expected type 'embedding'
INFO 2025-12-23 15:44:20,678 uvicorn.access:473 uncategorized: 10.89.1.4:33362 - "POST /v1/vector_stores HTTP/1.1"
500
Expected behavior
No errors, and embeddings calculated correctly with the specified vllm-served embedding model.
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working