Skip to content

feat: add legacy /chat/completions support#1804

Open
chrisreadsf wants to merge 11 commits intomainfrom
chris/model-base-url
Open

feat: add legacy /chat/completions support#1804
chrisreadsf wants to merge 11 commits intomainfrom
chris/model-base-url

Conversation

@chrisreadsf
Copy link
Member

@chrisreadsf chrisreadsf commented Mar 10, 2026

why

providers that only support /chat/completions are not supported

what changed

  • added chatcompletions provider prefix (e.g. chatcompletions/glm-4-flash) that uses Chat Completions API instead of Responses API
  • threaded baseURL from SDK through server to core via x-model-base-url header, mirroring existing apiKey pattern
  • added output: "no-schema" fallback for chatcompletions models that can't do structured output, with a safety-net catch for other
    fallback-pattern models
  • added fallback parsing for malformed model outputs (e.g. "[]" as string, missing array fields)

sister python PR here: browserbase/stagehand-python#318

test plan

tested locally with ZhipuAI glm-4-flash: observe, act, extract, and agent execute all pass

Thread modelBaseURL from x-model-base-url header through to V3 options,
enabling providers like ZhipuAI, Ollama, and other OpenAI-compatible
endpoints. Uses Chat Completions API (not Responses API) when a custom
baseURL is set, and adds robust response coercion for models without
native structured output support.
Adds "chatcompletions" as a generic provider that uses the Chat
Completions API (/chat/completions) instead of the Responses API,
for endpoints like ZhipuAI and Ollama. Also simplifies response
coercion for models without native structured output support.
@changeset-bot
Copy link

changeset-bot bot commented Mar 10, 2026

🦋 Changeset detected

Latest commit: e19a7e0

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 5 packages
Name Type
@browserbasehq/stagehand Minor
@browserbasehq/stagehand-server-v3 Minor
@browserbasehq/stagehand-server-v4 Minor
@browserbasehq/browse-cli Patch
@browserbasehq/stagehand-evals Patch

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

Copy link
Contributor

@cubic-dev-ai cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 issue found across 8 files

Confidence score: 2/5

  • There is a high-confidence regression risk in packages/core/lib/v3/llm/LLMProvider.ts: the chatcompletions.chat() mapping is only applied in the hasValidOptions path, so behavior diverges between configured and default client flows.
  • When clientOptions are absent, the else branch calls provider(subModelName) on the default openai instance, which can route chatcompletions models incorrectly and cause user-facing failures in common usage.
  • Pay close attention to packages/core/lib/v3/llm/LLMProvider.ts - ensure model normalization/dispatch is consistent in both branches so default and custom options behave the same.
Prompt for AI agents (unresolved issues)

Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name="packages/core/lib/v3/llm/LLMProvider.ts">

<violation number="1" location="packages/core/lib/v3/llm/LLMProvider.ts:53">
P1: The `chatcompletions` → `.chat()` handling only exists in the `hasValidOptions` branch. When no `clientOptions` are provided, the `else` branch calls `provider(subModelName)` on the default `openai` instance, which uses the Responses API — silently defeating the purpose of this provider.

Add the same `.chat()` handling in the `else` branch so `chatcompletions/model-name` always uses the Chat Completions API regardless of whether client options are present.</violation>
</file>
Architecture diagram
sequenceDiagram
    participant Client
    participant Server as Server (Fastify)
    participant Store as Session Store
    participant Prov as LLM Provider (Core)
    participant SDK as AI SDK Wrapper
    participant LLM as External LLM API

    Note over Client, LLM: Runtime flow for Model Base URL & Chat Completions Support

    Client->>Server: Request (Header: x-model-base-url, Body: provider/model)
    Server->>Server: NEW: getModelBaseURL() 
    Note right of Server: Checks body.options.model.baseURL <br/>OR x-model-base-url header

    Server->>Store: createSession(modelBaseURL, apiKey, ...)
    Store->>Prov: getAISDKLanguageModel(provider, model, baseURL)

    alt NEW: Provider prefix is "chatcompletions/"
        Prov->>Prov: Map to OpenAI provider instance
        Prov->>Prov: NEW: Force .chat() method (bypasses /responses)
    else Standard Provider
        Prov->>Prov: Initialize standard AI SDK provider
    end
    Prov-->>Store: LanguageModel instance (with baseURL)

    Store->>SDK: generateObject(schema, options)
    
    alt NEW: Model requires Prompt JSON Fallback
        SDK->>LLM: generateObject(output: "no-schema")
        LLM-->>SDK: Raw JSON String / Partial Object
        
        SDK->>SDK: NEW: Coerce stringified fields (e.g., "[]" to [])
        
        alt Schema Validation Fails
            SDK->>SDK: NEW: Heuristic fix (default missing arrays to [])
            SDK->>SDK: safeParse() retry
        end
    else Native Structured Output
        SDK->>LLM: generateObject(schema: ZodSchema)
        LLM-->>SDK: Structured Data
    end

    SDK-->>Store: Validated Object
    Store-->>Server: Session Result
    Server-->>Client: 200 OK / Stream Response
Loading

Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.

@greptile-apps
Copy link
Contributor

greptile-apps bot commented Mar 10, 2026

Greptile Summary

This PR adds support for OpenAI-compatible providers that only expose the /chat/completions endpoint (not the newer /responses endpoint), using a new chatcompletions/<model> provider prefix. It also threads a baseURL override through the server stack via an x-model-base-url header (mirroring the existing x-model-api-key pattern), and adds a no-schema fallback path in aisdk.ts for models that lack native structured-output support, with best-effort coercion of malformed responses.

Key implementation areas:

  • LLMProvider.ts: Registers chatcompletions in both static and factory provider maps, and calls provider.chat(subModelName) (instead of the default provider(subModelName)) when subProvider === "chatcompletions" to target /chat/completions. However, this special-case is only applied in the hasValidOptions branch; the else branch is missing the .chat() call and would silently use the Responses API if no API key or baseURL is supplied.
  • aisdk.ts: Adds a needsPromptJsonFallback branch that calls generateObject with output: "no-schema", parses the free-form response against the Zod schema, and retries after defaulting missing top-level array fields to []. The second schema.parse() in the retry is not wrapped in a try/catch, so a remaining ZodError surfaces as an untyped error rather than a structured diagnostic.
  • header.ts: Adds getModelBaseURL following the established body-then-header precedence pattern — clean and consistent.
  • SessionStore.ts / InMemorySessionStore.ts / stream.ts / start.ts: Propagates modelBaseURL from request context through to V3Options, mirroring the existing modelApiKey propagation cleanly.
  • types/model.ts: Registers "chatcompletions" in AISDK_PROVIDERS, enabling provider-name validation at the session-start route.

Confidence Score: 3/5

  • Mostly safe to merge, but a logic gap in LLMProvider.ts means chatcompletions silently degrades to the Responses API when used without client options.
  • The server-side baseURL threading is clean and consistent with existing patterns. The aisdk.ts no-schema fallback is reasonable and functional. The main concern is the missing .chat() call in the else branch of getAISDKLanguageModel — while unlikely to be hit in practice (since chatcompletions almost always requires a baseURL), it creates a silent misbehavior rather than an error. The unguarded second schema.parse() in the coercion retry path is a secondary concern around error diagnostics.
  • packages/core/lib/v3/llm/LLMProvider.ts — the else branch in getAISDKLanguageModel (line 131–139) lacks the .chat() special-case for chatcompletions. Recommend adding the .chat() call to ensure consistent behavior regardless of whether client options are provided.

Sequence Diagram

sequenceDiagram
    participant Client as SDK Client
    participant Server as server-v3
    participant Header as header.ts
    participant Store as InMemorySessionStore
    participant Core as LLMProvider
    participant AISDK as AI SDK

    Client->>Server: POST /v1/sessions/start<br/>x-model-base-url header<br/>modelName: chatcompletions/glm-4-flash

    Server->>Header: getModelBaseURL(request)
    Header-->>Server: baseURL value

    Server->>Header: getModelApiKey(request)
    Header-->>Server: apiKey value

    Server->>Store: getOrCreateStagehand(sessionId, ctx)

    Store->>Core: getAISDKLanguageModel("chatcompletions", "glm-4-flash", clientOptions)

    Note over Core: hasValidOptions = true<br/>(baseURL or apiKey present)
    Core->>AISDK: createOpenAI({ baseURL, apiKey })
    AISDK-->>Core: provider instance
    Core->>AISDK: provider.chat("glm-4-flash")
    Note over AISDK: Targets /chat/completions<br/>instead of /responses

    AISDK-->>Core: LanguageModelV2
    Core-->>Store: AISdkClient
    Store-->>Server: V3 instance
    Server-->>Client: sessionId + cdpUrl
Loading

Comments Outside Diff (1)

  1. packages/core/lib/v3/llm/LLMProvider.ts, line 131-139 (link)

    chatcompletions silently falls back to Responses API when no client options are provided

    The .chat() special-case that routes to /chat/completions is only applied inside the hasValidOptions branch (line 125–127). The else branch below calls provider(subModelName) directly on the static openai instance, which routes to the Responses API — the exact opposite of what chatcompletions is supposed to do.

    In practice chatcompletions will almost always be paired with a baseURL (that's the whole point), so hasValidOptions will be true. But someone who omits the baseURL and apiKey (e.g., relying on an OPENAI_API_KEY env var only) will silently get the Responses API instead.

Last reviewed commit: 87a5801

Comment on lines +207 to +216
for (const issue of firstTry.error.issues) {
if (
issue.code === "invalid_type" &&
issue.expected === "array" &&
issue.path.length === 1
) {
raw[issue.path[0] as string] = [];
}
}
parsed = options.response_model.schema.parse(raw);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Second parse() call can throw an untyped ZodError

After the array-field defaulting loop, options.response_model.schema.parse(raw) is called without a try/catch. If the response still fails validation for any reason other than a missing top-level array field (e.g., a nested object type mismatch, an extra required field), a raw ZodError is thrown. That error is caught by the outer catch (err) block, but that block only checks for NoObjectGeneratedError.isInstance(err) — a ZodError will just be re-thrown without the special logging context.

Consider wrapping this in a try/catch that converts ZodError into something more informative, or using .safeParse() again and surfacing the issues clearly:

Suggested change
for (const issue of firstTry.error.issues) {
if (
issue.code === "invalid_type" &&
issue.expected === "array" &&
issue.path.length === 1
) {
raw[issue.path[0] as string] = [];
}
}
parsed = options.response_model.schema.parse(raw);
const secondTry = options.response_model.schema.safeParse(raw);
if (!secondTry.success) {
throw new Error(
`Model response could not be coerced into the expected schema: ${secondTry.error.message}`,
);
}
parsed = secondTry.data;

Try structured output (schema:) first for all models. Only fall back
to no-schema + response coercion when the call fails and the model
matches a known fallback pattern. This avoids degrading DeepSeek/Kimi
which already work with schema:.
Copy link
Contributor

@cubic-dev-ai cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 issue found across 2 files (changes from recent commits).

Prompt for AI agents (unresolved issues)

Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name="packages/core/lib/v3/llm/aisdk.ts">

<violation number="1" location="packages/core/lib/v3/llm/aisdk.ts:179">
P1: Models in `PROMPT_JSON_FALLBACK_PATTERNS` (deepseek, kimi, glm) will now always make a wasted API call that fails before falling back to no-schema mode. Previously these models skipped straight to the no-schema path. This doubles latency and cost for every extract call on these providers.

Consider keeping the original structure where `needsPromptJsonFallback` is checked *before* the first call, and only use the try-then-fallback pattern for models that are *not* in the known fallback list (i.e., the `chatcompletions/` prefix models that aren't predictable from the model ID).</violation>
</file>

Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.

// Try structured output first. If the provider doesn't support
// response_format (e.g. chatcompletions/ endpoints), this will throw
// and we fall back to no-schema mode with response coercion below.
objectResponse = await generateObject({
Copy link
Contributor

@cubic-dev-ai cubic-dev-ai bot Mar 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1: Models in PROMPT_JSON_FALLBACK_PATTERNS (deepseek, kimi, glm) will now always make a wasted API call that fails before falling back to no-schema mode. Previously these models skipped straight to the no-schema path. This doubles latency and cost for every extract call on these providers.

Consider keeping the original structure where needsPromptJsonFallback is checked before the first call, and only use the try-then-fallback pattern for models that are not in the known fallback list (i.e., the chatcompletions/ prefix models that aren't predictable from the model ID).

Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At packages/core/lib/v3/llm/aisdk.ts, line 179:

<comment>Models in `PROMPT_JSON_FALLBACK_PATTERNS` (deepseek, kimi, glm) will now always make a wasted API call that fails before falling back to no-schema mode. Previously these models skipped straight to the no-schema path. This doubles latency and cost for every extract call on these providers.

Consider keeping the original structure where `needsPromptJsonFallback` is checked *before* the first call, and only use the try-then-fallback pattern for models that are *not* in the known fallback list (i.e., the `chatcompletions/` prefix models that aren't predictable from the model ID).</comment>

<file context>
@@ -173,19 +173,39 @@ You must respond in JSON format. respond WITH JSON. Do not include any other tex
+        // Try structured output first. If the provider doesn't support
+        // response_format (e.g. chatcompletions/ endpoints), this will throw
+        // and we fall back to no-schema mode with response coercion below.
+        objectResponse = await generateObject({
+          model: this.model,
+          messages: formattedMessages,
</file context>
Fix with Cubic

@chrisreadsf chrisreadsf changed the title Chris/model base url feat: add legacy /chat/completions support Mar 10, 2026
- Skip schema attempt for chatcompletions/ models (provider: openai.chat)
  since they can't do structured output — avoids a wasted LLM call per extract
- Unify .chat() handling in getAISDKLanguageModel so chatcompletions/ works
  regardless of whether clientOptions are provided
- Guard second schema.parse() with safeParse + descriptive error message
Copy link
Contributor

@cubic-dev-ai cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 issue found across 2 files (changes from recent commits).

Prompt for AI agents (unresolved issues)

Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name="packages/core/lib/v3/llm/aisdk.ts">

<violation number="1" location="packages/core/lib/v3/llm/aisdk.ts:291">
P1: Custom agent: **Exception and error message sanitization**

Generic `new Error()` with unsanitized Zod error message that may reflect sensitive prompt data back to the caller. Per the error-sanitization rule, use a typed error class and strip or redact the raw Zod message (which can contain actual field values from the model response).</violation>
</file>

Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.

// 4. Validate against schema
const secondTry = options.response_model.schema.safeParse(raw);
if (!secondTry.success) {
throw new Error(
Copy link
Contributor

@cubic-dev-ai cubic-dev-ai bot Mar 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1: Custom agent: Exception and error message sanitization

Generic new Error() with unsanitized Zod error message that may reflect sensitive prompt data back to the caller. Per the error-sanitization rule, use a typed error class and strip or redact the raw Zod message (which can contain actual field values from the model response).

Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At packages/core/lib/v3/llm/aisdk.ts, line 291:

<comment>Generic `new Error()` with unsanitized Zod error message that may reflect sensitive prompt data back to the caller. Per the error-sanitization rule, use a typed error class and strip or redact the raw Zod message (which can contain actual field values from the model response).</comment>

<file context>
@@ -172,115 +172,129 @@ You must respond in JSON format. respond WITH JSON. Do not include any other tex
+          // 4. Validate against schema
+          const secondTry = options.response_model.schema.safeParse(raw);
+          if (!secondTry.success) {
+            throw new Error(
+              `Model response could not be coerced into the expected schema: ${secondTry.error.message}`,
+            );
</file context>
Fix with Cubic

@github-actions
Copy link
Contributor

github-actions bot commented Mar 10, 2026

✱ Stainless preview builds

This PR will update the stagehand SDKs with the following commit message.

feat: add legacy /chat/completions support

Edit this comment to update it. It will appear in the SDK's changelogs.

stagehand-typescript studio · code · diff

Your SDK build had at least one "note" diagnostic, but this did not represent a regression.
generate ✅build ✅lint ✅test ✅

npm install https://pkg.stainless.com/s/stagehand-typescript/3fbb8f58174f0d115aa04aab96e745c8127c0f44/dist.tar.gz
stagehand-openapi studio · code · diff

Your SDK build had at least one "note" diagnostic, but this did not represent a regression.
generate ✅

stagehand-ruby studio · conflict

Your SDK build resulted in a merge conflict between your custom code and the newly generated changes, which is a regression from the base state.
You don't need to resolve this conflict right now, but you will need to resolve it for your changes to be released to your users. Read more about why this happened here.

stagehand-php studio · code · diff

Your SDK build had at least one "note" diagnostic, but this did not represent a regression.
generate ✅lint ✅test ✅

stagehand-go studio · code · diff

Your SDK build had at least one "note" diagnostic, but this did not represent a regression.
generate ✅build ✅lint ✅test ✅

go get github.com/stainless-sdks/stagehand-go@a6926dbb0c88425eca6d4245fae11bc034ab2741
stagehand-kotlin studio · conflict

Your SDK build resulted in a merge conflict between your custom code and the newly generated changes, which is a regression from the base state.
You don't need to resolve this conflict right now, but you will need to resolve it for your changes to be released to your users. Read more about why this happened here.

stagehand-java studio · conflict

Your SDK build resulted in a merge conflict between your custom code and the newly generated changes, which is a regression from the base state.
You don't need to resolve this conflict right now, but you will need to resolve it for your changes to be released to your users. Read more about why this happened here.

stagehand-python studio · conflict

Your SDK build resulted in a merge conflict between your custom code and the newly generated changes, which is a regression from the base state.
You don't need to resolve this conflict right now, but you will need to resolve it for your changes to be released to your users. Read more about why this happened here.

stagehand-csharp studio · code · diff

Your SDK build had at least one "warning" diagnostic, but this did not represent a regression.
generate ⚠️build ❗lint ❗test ✅


This comment is auto-generated by GitHub Actions and is automatically kept up to date as you push.
If you push custom code to the preview branch, re-run this workflow to update the comment.
Last updated: 2026-03-10 18:05:07 UTC

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant