Skip to content

feat(providers): add "local/" routing prefix for OpenAI-compatible local servers#3044

Open
np6126 wants to merge 1 commit into
ultraworkers:mainfrom
np6126:feat/local-routing-prefix
Open

feat(providers): add "local/" routing prefix for OpenAI-compatible local servers#3044
np6126 wants to merge 1 commit into
ultraworkers:mainfrom
np6126:feat/local-routing-prefix

Conversation

@np6126
Copy link
Copy Markdown

@np6126 np6126 commented May 17, 2026

Summary

A common deployment is running an OpenAI-compatible inference server locally — Ollama, LM Studio, vLLM, llama.cpp — and routing claw-code's OpenAI provider client at it via OPENAI_BASE_URL.

The obvious model id formats actively misroute today:

  • qwen/qwen3:14b or qwen3-codermetadata_for_model maps the qwen family to DashScope (DASHSCOPE_API_KEY, dashscope.aliyuncs.com). A user serving Qwen3 locally on Ollama is silently routed to Alibaba's cloud.
  • kimi/kimi-k2.5 → same DashScope routing.
  • grok-* → routes to xAI.

The existing openai/ prefix would work as a workaround, but mentally conflates "this is my local Qwen3" with "the OpenAI API itself", and a user who does want the existing OpenRouter behaviour (slug preserved on the wire for openai/...) can't have it both ways.

This PR adds local/ as an explicit routing prefix that says "this is a local OpenAI-compatible server, route as OpenAI client, strip the prefix on the wire". Routing semantics are identical to openai/: OpenAi provider, OPENAI_API_KEY (typically an unused placeholder for local servers), OPENAI_BASE_URL.

Diff

// providers/mod.rs
-if canonical.starts_with("openai/") || canonical.starts_with("gpt-") {
+if canonical.starts_with("openai/")
+    || canonical.starts_with("local/")
+    || canonical.starts_with("gpt-")
+{
     return Some(ProviderMetadata { provider: ProviderKind::OpenAi, ... });
 }

// providers/openai_compat.rs (wire_model_for_base_url)
-if matches!(lowered_prefix.as_str(), "xai" | "grok" | "qwen" | "kimi") {
+if matches!(lowered_prefix.as_str(), "local" | "xai" | "grok" | "qwen" | "kimi") {
     return Cow::Borrowed(&model[pos + 1..]);
 }

Two places only — both pure additions.

No change to strip_routing_prefix — it's #[allow(dead_code)] and only referenced from its own unit tests; adding local there would be cosmetic.

No change to the OpenAI gateway slug preservation logic used by OpenRouter and similar gateways with a non-default OPENAI_BASE_URLopenai/... behaviour stays exactly as upstream.

Example

export OPENAI_BASE_URL=http://localhost:11434/v1
export OPENAI_API_KEY=ollama-placeholder
claw --model local/qwen3:14b "..."

Wire request: POST $OPENAI_BASE_URL/chat/completions with model=qwen3:14b. No misrouting to DashScope.

Test plan

  • cargo check --workspace clean
  • Existing provider tests: cargo test -p api
  • Manual: route local/qwen3:14b against a local Ollama, verify the request goes to the local server with bare model id on the wire

Companion discussion

A separate issue (#3045) covers two related open questions: <think>...</think> block filtering in streamed output, and whether the openai/-slug preservation logic should remain non-configurable once local/ is available. Posted separately because both involve tradeoffs upstream may want to weigh in on before code lands.

…cal servers

A common use case is running an OpenAI-compatible inference server locally —
Ollama, LM Studio, vLLM, llama.cpp — and routing to it through the OpenAI
provider client.

The existing "openai/" prefix would work, but mentally conflates "local
Ollama / vLLM" with "the OpenAI API itself", and the more obvious model id
formats actively misroute:

  - "qwen/qwen3:14b" or "qwen3-coder" → metadata_for_model maps the "qwen"
    family to DashScope (DASHSCOPE_API_KEY, dashscope.aliyuncs.com). A user
    serving Qwen3 locally on Ollama gets routed to Alibaba's cloud instead
    of their local server.
  - "kimi/kimi-k2.5" → same DashScope routing.
  - "grok-..." → routes to xAI.

None of these can express "this is a local copy of that family, route as
OpenAI-compatible to my local server" without forcibly setting
OPENAI_BASE_URL plus the "openai/" prefix — which is unintuitive.

The "local/" prefix solves it: operators express intent ("this is a local
inference server, route as OpenAI client") without naming collision.
Routing semantics are identical to "openai/": OpenAi provider,
OPENAI_API_KEY auth env (typically an unused placeholder for local
servers), OPENAI_BASE_URL.

Two places updated:
  - metadata_for_model: recognise "local/" alongside "openai/"
  - wire_model_for_base_url: strip "local/" prefix on the wire

No change to strip_routing_prefix (#[allow(dead_code)], tests only) and no
change to the OpenAI gateway slug preservation logic (used by OpenRouter
and similar gateways with non-default OPENAI_BASE_URL) — that behaviour for
"openai/" stays exactly as upstream.
np6126 pushed a commit to np6126/tank-claw-os that referenced this pull request May 17, 2026
…am-drop

The two patches we ship in bootc/patches/ each contained an experimental
mix of changes that aren't all needed in this setup. Splitting them into
intent-aligned files makes it possible to drop them individually as
upstream merges happen, without having to surgically extract pieces.

- claw-fix-stream-newlines.patch: now contains only the trailing-newline
  restoration in MarkdownStreamState::push. Matches the change in
  ultraworkers/claw-code#3043 1:1, so the file drops out the moment
  that PR lands and CLAW_CODE_REF is bumped.

- claw-fix-openai-prefix-strip.patch: now contains only the "local/"
  routing-prefix additions in metadata_for_model and
  wire_model_for_base_url. The dead-code edit to strip_routing_prefix and
  the OpenRouter slug-preservation removal are gone — both were
  experimental and unneeded for this setup (we use "local/" prefix, not
  "openai/"). Matches ultraworkers/claw-code#3044 1:1, drops out when
  that PR lands.

- claw-add-think-block-filter.patch (new): isolates the <think>...</think>
  block filtering for thinking models (Qwen3 et al). Applied on top of the
  newlines patch since both touch MarkdownStreamState::push. Stays as a
  local patch until ultraworkers/claw-code#3045 resolves; the filter
  remains usable as a standalone patch against post-#3043 upstream
  because its context lines reference the newlines-applied state.

Containerfile applies the three patches in order: newlines → think → local-prefix.

Verified: all three apply cleanly to the pinned CLAW_CODE_REF and the
resulting tree compiles (cargo check --workspace).
@shakoorshkh
Copy link
Copy Markdown

Model id formats like qwen/qwen3:14b silently misroute to DashScope when the user is running a local Ollama server. No error — wrong destination.

Adds a new prefix convention (local/) that users must learn. If OPENAI_BASE_URL is not set, local/ routes to OpenAI with a stripped model id, which will likely 404.

Explicit prefix over inference. Users targeting local servers now carry that intent in the model string rather than relying on routing heuristics to guess correctly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants