Ollama cloud models fail: native /api/chat times out, /v1/chat/completions works

## Problem

llmspy routes Ollama requests through the native `/api/chat` endpoint, which times out (or returns errors) for Ollama cloud/remote models (e.g., `glm-5:cloud`). The OpenAI-compatible `/v1/chat/completions` endpoint works correctly for the same models.

## Reproduction

With Ollama running and `glm-5:cloud` pulled:

```bash
# This times out / fails:
curl http://ollama:11434/api/chat \
  -d '{"model":"glm-5:cloud","messages":[{"role":"user","content":"hi"}],"stream":false}'

# This works:
curl http://ollama:11434/v1/chat/completions \
  -d '{"model":"glm-5:cloud","messages":[{"role":"user","content":"hi"}],"stream":false}'
```

llmspy returns:
```json
{"responseStatus": {"errorCode": "Error", "message": "Expecting value: line 1 column 1 (char 0)"}}
```

## Root Cause

In `llms/main.py`, the `OllamaProvider` sends chat requests to `{api}/api/chat` (the native Ollama endpoint). For Ollama cloud/remote models, this endpoint doesn't work reliably — only the OpenAI-compatible `/v1/chat/completions` endpoint handles them correctly.

## Secondary Issue: Silent Model Discovery Failure

When `all_models: true` is set and Ollama is temporarily unreachable at llmspy startup (e.g., network not ready, binding mismatch), `load_models()` fails silently. llmspy then has an empty model list for Ollama and returns "Model not found" for all requests, even after Ollama becomes reachable. A restart of llmspy is required to recover.

## Suggested Fixes

1. **Use `/v1/chat/completions` for Ollama** — switch to the OpenAI-compatible endpoint which handles both local and cloud models
2. **Retry model discovery** — if `load_models()` fails at startup, retry periodically or on first request rather than failing permanently

## Environment

- Ollama with remote/cloud models (`glm-5:cloud`)
- llmspy `v3.0.34-obol.1`
- Kubernetes (k3d) with ExternalName service routing to host Ollama

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ollama cloud models fail: native /api/chat times out, /v1/chat/completions works #6

Problem

Reproduction

Root Cause

Secondary Issue: Silent Model Discovery Failure

Suggested Fixes

Environment

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Ollama cloud models fail: native /api/chat times out, /v1/chat/completions works #6

Description

Problem

Reproduction

Root Cause

Secondary Issue: Silent Model Discovery Failure

Suggested Fixes

Environment

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions