Two open behaviour questions related to the local-server experience. Splitting them out as discussion before opening PRs that would touch the same areas as #3044 (local/ routing prefix), since both involve genuine tradeoffs between use cases.
1. <think>...</think> block filtering in streamed output
Thinking models (notably Qwen3) emit <think>...</think> blocks as part of their raw SSE content. Those tags pass through verbatim today and end up rendered in the terminal, which is noisy for users who want just the final answer.
Possible approaches:
- Always strip
<think>...</think> from the stream inside MarkdownStreamState::push — needs to carry state across deltas since a block can span multiple chunks. ~30 lines of code; I have a prototype downstream.
- Opt-in flag (
--hide-thinking, env var, or config).
- Separate channel so the UI can collapse but still surface it (best long-term shape IMO, but more work).
- Don't filter — keep verbatim, the model controls its own output.
Happy to submit a PR for whichever direction you'd prefer. Posting here first because (1) is opinionated and (3) is structural.
2. openai/-slug preservation vs. local prefix-strip
wire_model_for_base_url currently has special-case handling for openai/... slugs when the base URL is non-default:
if config.provider_name == \"OpenAI\" && trimmed_base_url != default_openai {
// Preserve the slug when the user configured a non-default OpenAI
// base URL; the prefix still routed to the OpenAI-compatible client,
// but the gateway owns the final model namespace.
return Cow::Borrowed(model);
}
This is exactly right for OpenRouter users who want openai/gpt-4.1-mini to be sent verbatim. The symmetric local-server case (Ollama at non-default base URL, model literally named openai/foo) gets the slug preserved when most operators would want it stripped.
The clean solution is the local/ prefix in #3044 — explicit "strip me, this is local", leaving openai/ for the OpenRouter slug-preservation case as-is. With local/ available, do you see a user case where the openai/ slug-preservation behaviour itself still needs to be configurable, or is local/ sufficient?
Asked here so the slug-preservation block can be left untouched if local/ is enough.
Two open behaviour questions related to the local-server experience. Splitting them out as discussion before opening PRs that would touch the same areas as #3044 (
local/routing prefix), since both involve genuine tradeoffs between use cases.1.
<think>...</think>block filtering in streamed outputThinking models (notably Qwen3) emit
<think>...</think>blocks as part of their raw SSE content. Those tags pass through verbatim today and end up rendered in the terminal, which is noisy for users who want just the final answer.Possible approaches:
<think>...</think>from the stream insideMarkdownStreamState::push— needs to carry state across deltas since a block can span multiple chunks. ~30 lines of code; I have a prototype downstream.--hide-thinking, env var, or config).Happy to submit a PR for whichever direction you'd prefer. Posting here first because (1) is opinionated and (3) is structural.
2.
openai/-slug preservation vs. local prefix-stripwire_model_for_base_urlcurrently has special-case handling foropenai/...slugs when the base URL is non-default:This is exactly right for OpenRouter users who want
openai/gpt-4.1-minito be sent verbatim. The symmetric local-server case (Ollama at non-default base URL, model literally namedopenai/foo) gets the slug preserved when most operators would want it stripped.The clean solution is the
local/prefix in #3044 — explicit "strip me, this is local", leavingopenai/for the OpenRouter slug-preservation case as-is. Withlocal/available, do you see a user case where theopenai/slug-preservation behaviour itself still needs to be configurable, or islocal/sufficient?Asked here so the slug-preservation block can be left untouched if
local/is enough.