Not possible to use embedding models served by vLLM using starter distro

### System Info

```
PyTorch version: 2.9.1+cpu
Is debug build: False
CUDA used to build PyTorch: None
ROCM used to build PyTorch: N/A

OS: Debian GNU/Linux 13 (trixie) (x86_64)
GCC version: (Debian 14.2.0-19) 14.2.0
Clang version: Could not collect
CMake version: Could not collect
Libc version: glibc-2.41

Python version: 3.12.12 (main, Dec  8 2025, 23:38:42) [GCC 14.2.0] (64-bit runtime)
Python platform: Linux-6.15.10-200.fc42.aarch64-x86_64-with-glibc2.41
Is CUDA available: False
CUDA runtime version: No CUDA
CUDA_MODULE_LOADING set to: N/A
GPU models and configuration: No CUDA
Nvidia driver version: No CUDA
cuDNN version: No CUDA
Is XPU available: False
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True

CPU:
Architecture:                            x86_64
CPU op-mode(s):                          32-bit
Byte Order:                              Little Endian
CPU(s):                                  8
On-line CPU(s) list:                     0-7
Vendor ID:                               0x61
Model name:                              -
Model:                                   0
Thread(s) per core:                      1
Core(s) per socket:                      8
Socket(s):                               1
Stepping:                                0x0
BogoMIPS:                                48.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 asimddp sha512 asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp flagm2 frint bf16 bti afp
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-7
Vulnerability Gather data sampling:      Not affected
Vulnerability Ghostwrite:                Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Vulnerable
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, but not BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected

Versions of relevant libraries:
[pip3] mypy_extensions==1.1.0
[pip3] numpy==2.3.5
[pip3] torch==2.9.1+cpu
[pip3] torchao==0.14.1+cpu
[pip3] torchtune==0.5.0+cpu
[pip3] torchvision==0.24.1+cpu
[conda] Could not collect
```

### Information

- [x] The official example scripts
- [ ] My own modified scripts

### 🐛 Describe the bug

Running in podman with: 

```
podman run -d \
    --name $LLAMASTACK_CONTAINER \
    --network $NETWORK_NAME \
    -e VLLM_API_TOKEN="my-token" \
    -e VLLM_URL="https://my-vllm-server/v1" \
    -v llamastack-data:/.llama:z \
    docker.io/llamastack/distribution-starter:0.3.5"
```

my remote vllm hosts 2 models: Qwen3 for inference and nomic-embed-text for embedding. Hitting `/v1/models` results in: 

```json
{
  "data": [
    {
      "id": "nomic-embed-text-v1-5",
      "object": "model",
      "created": 1677610602,
      "owned_by": "openai"
    },
    {
      "id": "qwen3-14b-gaudi",
      "object": "model",
      "created": 1677610602,
      "owned_by": "openai"
    }
  ],
  "object": "list"
}
```
When I try to create a vector store with:

```python
        vector_store = client.vector_stores.create(
            name=vector_store_id,
            extra_body={
                "embedding_model": "vllm/nomic-embed-text-v1-5",
                "embedding_dimension": 768
            }
        }
```
I get an error (see Error logs section). It thinks that the embedding model is an llm. I believe this is because the vLLM provider doesn't have a hard-coded "common" list of embedding models ('embedding_model_metadata` in the provider source). So it thinks that _all_ models served by vLLM are LLMs and fails to use them as embedding models.




### Error logs

```
ERROR    2025-12-23 15:44:20,028 llama_stack.core.server.server:290 core::server: Error executing endpoint              
         route='/v1/vector_stores' method='post'                                                                        
         ╭───────────────────────────────────── Traceback (most recent call last) ─────────────────────────────────────╮
}         │ /usr/local/lib/python3.12/site-packages/llama_stack/core/server/server.py:280 in route_handler              │
}         │                                                                                                             │
�         │   277 │   │   │   │   │   return StreamingResponse(gen, media_type="text/event-stream")                     │
�         │   278 │   │   │   │   else:                                                                                 │
�         │   279 │   │   │   │   │   value = func(**kwargs)                                                            │
�         │ ❱ 280 │   │   │   │   │   result = await maybe_await(value)                                                 │
�         │   281 │   │   │   │   │   if isinstance(result, PaginatedResponse) and result.url is None:                  │
�         │   282 │   │   │   │   │   │   result.url = route                                                            │
}         │   283                                                                                                       │
}         │                                                                                                             │
}         │ /usr/local/lib/python3.12/site-packages/llama_stack/core/server/server.py:202 in maybe_await                │
}         │                                                                                                             │
}         │   199                                                                                                       │
}         │   200 async def maybe_await(value):                                                                         │
         │   201 │   if inspect.iscoroutine(value):                                                                    │
�         │ ❱ 202 │   │   return await value                                                                            │
         │   203 │   return value                                                                                      │
}         │   204                                                                                                       │
}         │   205                                                                                                       │
}         │                                                                                                             │
}         │ /usr/local/lib/python3.12/site-packages/llama_stack/providers/utils/telemetry/trace_protocol.py:101 in      │
}         │ async_wrapper                                                                                               │
}         │                                                                                                             │
�         │    98 │   │   │                                                                                             │
�         │    99 │   │   │   with tracing.span(f"{class_name}.{method_name}", span_attributes) as span:                │
�         │   100 │   │   │   │   try:                                                                                  │
�         │ ❱ 101 │   │   │   │   │   result = await method(self, *args, **kwargs)                                      │
�         │   102 │   │   │   │   │   span.set_attribute("output", serialize_value(result))                             │
�         │   103 │   │   │   │   │   return result                                                                     │
�         │   104 │   │   │   │   except Exception as e:                                                                │
}         │                                                                                                             │
}         │ /usr/local/lib/python3.12/site-packages/llama_stack/core/routers/vector_io.py:150 in                        │
}         │ openai_create_vector_store                                                                                  │
}         │                                                                                                             │
�         │   147 │   │   │   │   provider_id = list(self.routing_table.impls_by_provider_id.keys())[0]                 │
�         │   148 │   │                                                                                                 │
�         │   149 │   │   vector_store_id = f"vs_{uuid.uuid4()}"                                                        │
�         │ ❱ 150 │   │   registered_vector_store = await self.routing_table.register_vector_store(                     │
�         │   151 │   │   │   vector_store_id=vector_store_id,                                                          │
�         │   152 │   │   │   embedding_model=embedding_model,                                                          │
�         │   153 │   │   │   embedding_dimension=embedding_dimension,                                                  │
}         │                                                                                                             │
}         │ /usr/local/lib/python3.12/site-packages/llama_stack/core/routing_tables/vector_stores.py:66 in              │
}         │ register_vector_store                                                                                       │
}         │                                                                                                             │
�         │    63 │   │   if model is None:                                                                             │
�         │    64 │   │   │   raise ModelNotFoundError(embedding_model)                                                 │
�         │    65 │   │   if model.model_type != ModelType.embedding:                                                   │
�         │ ❱  66 │   │   │   raise ModelTypeError(embedding_model, model.model_type, ModelType.embedding)              │
�         │    67 │   │                                                                                                 │
�         │    68 │   │   vector_store = VectorStoreWithOwner(                                                          │
�         │    69 │   │   │   identifier=vector_store_id,                                                               │
W         ╰─────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
         ModelTypeError: Model 'vllm/nomic-embed-text-v1-5' is of type 'llm' rather than the expected type 'embedding'  
INFO     2025-12-23 15:44:20,678 uvicorn.access:473 uncategorized: 10.89.1.4:33362 - "POST /v1/vector_stores HTTP/1.1"  
         500                                                                                                            
```


### Expected behavior

No errors, and embeddings calculated correctly with the specified vllm-served embedding model.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Not possible to use embedding models served by vLLM using starter distro #4428

System Info

Information

🐛 Describe the bug

Error logs

Expected behavior

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Not possible to use embedding models served by vLLM using starter distro #4428

Description

System Info

Information

🐛 Describe the bug

Error logs

Expected behavior

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions