Problem
llmspy routes Ollama requests through the native /api/chat endpoint, which times out (or returns errors) for Ollama cloud/remote models (e.g., glm-5:cloud). The OpenAI-compatible /v1/chat/completions endpoint works correctly for the same models.
Reproduction
With Ollama running and glm-5:cloud pulled:
# This times out / fails:
curl http://ollama:11434/api/chat \
-d '{"model":"glm-5:cloud","messages":[{"role":"user","content":"hi"}],"stream":false}'
# This works:
curl http://ollama:11434/v1/chat/completions \
-d '{"model":"glm-5:cloud","messages":[{"role":"user","content":"hi"}],"stream":false}'
llmspy returns:
{"responseStatus": {"errorCode": "Error", "message": "Expecting value: line 1 column 1 (char 0)"}}
Root Cause
In llms/main.py, the OllamaProvider sends chat requests to {api}/api/chat (the native Ollama endpoint). For Ollama cloud/remote models, this endpoint doesn't work reliably — only the OpenAI-compatible /v1/chat/completions endpoint handles them correctly.
Secondary Issue: Silent Model Discovery Failure
When all_models: true is set and Ollama is temporarily unreachable at llmspy startup (e.g., network not ready, binding mismatch), load_models() fails silently. llmspy then has an empty model list for Ollama and returns "Model not found" for all requests, even after Ollama becomes reachable. A restart of llmspy is required to recover.
Suggested Fixes
- Use
/v1/chat/completions for Ollama — switch to the OpenAI-compatible endpoint which handles both local and cloud models
- Retry model discovery — if
load_models() fails at startup, retry periodically or on first request rather than failing permanently
Environment
- Ollama with remote/cloud models (
glm-5:cloud)
- llmspy
v3.0.34-obol.1
- Kubernetes (k3d) with ExternalName service routing to host Ollama
Problem
llmspy routes Ollama requests through the native
/api/chatendpoint, which times out (or returns errors) for Ollama cloud/remote models (e.g.,glm-5:cloud). The OpenAI-compatible/v1/chat/completionsendpoint works correctly for the same models.Reproduction
With Ollama running and
glm-5:cloudpulled:llmspy returns:
{"responseStatus": {"errorCode": "Error", "message": "Expecting value: line 1 column 1 (char 0)"}}Root Cause
In
llms/main.py, theOllamaProvidersends chat requests to{api}/api/chat(the native Ollama endpoint). For Ollama cloud/remote models, this endpoint doesn't work reliably — only the OpenAI-compatible/v1/chat/completionsendpoint handles them correctly.Secondary Issue: Silent Model Discovery Failure
When
all_models: trueis set and Ollama is temporarily unreachable at llmspy startup (e.g., network not ready, binding mismatch),load_models()fails silently. llmspy then has an empty model list for Ollama and returns "Model not found" for all requests, even after Ollama becomes reachable. A restart of llmspy is required to recover.Suggested Fixes
/v1/chat/completionsfor Ollama — switch to the OpenAI-compatible endpoint which handles both local and cloud modelsload_models()fails at startup, retry periodically or on first request rather than failing permanentlyEnvironment
glm-5:cloud)v3.0.34-obol.1