feat(google-vertex): update model YAMLs [bot] by harshiv-26 · Pull Request #946 · truefoundry/models

harshiv-26 · 2026-05-05T13:50:17Z

Auto-generated by poc-agent for provider google-vertex.

Note

Medium Risk
Large-scale updates to Vertex model YAML metadata (limits, costs, provisioning, deprecation/status) could change model selection/availability and cost calculations if these configs drive runtime behavior.

Overview
Updates the google-vertex model YAML catalog with broader, more complete metadata across many models, including provisioning (serverless vs provisioned), sources, status, features, limits, and expanded per-region costs.

Introduces/updates lifecycle flags like deprecationDate, isDeprecated, and transitions some entries to deprecated/retired, plus a few capability adjustments (e.g., adding parallel_function_calling, tweaking modalities, and increasing token limits for select models).

^{Reviewed by Cursor Bugbot for commit 1fda97e. Bugbot is set up for automated code reviews on this repo. Configure here.}

github-actions · 2026-05-05T13:50:29Z

/test-models

harshiv-26 · 2026-05-05T13:53:10Z

Gateway test results

Total: 263
Passed: 189
Failed: 46
Validation failed: 3
Errored: 0
Skipped: 25
Success rate: 79.41%

Provider	Model	Scenarios
`google-vertex`	`anthropic/claude-haiku-4-5@20251001`	success: tool-call, tool-call:stream, params:stream, structured-output, structured-output:stream, params, reasoning, reasoning:stream
`google-vertex`	`anthropic/claude-opus-4-5`	success: tool-call, tool-call:stream, structured-output, structured-output:stream, params, params:stream, reasoning, reasoning:stream
`google-vertex`	`anthropic/claude-opus-4-5@20251101`	success: parallel-tool-call, params:stream, parallel-tool-call:stream, tool-call:stream, structured-output, tool-call, params, structured-output:stream, reasoning, reasoning:stream
`google-vertex`	`anthropic/claude-opus-4-6`	success: tool-call:stream, structured-output:stream, params, params:stream, tool-call, structured-output, reasoning:stream, reasoning
`google-vertex`	`anthropic/claude-opus-4-6@default`	success: structured-output, parallel-tool-call:stream, parallel-tool-call, structured-output:stream, tool-call:stream, params:stream, params, tool-call, reasoning:stream, reasoning
`google-vertex`	`anthropic/claude-opus-4@20250514`	skipped: skip-check
`google-vertex`	`anthropic/claude-sonnet-4-6`	success: structured-output:stream, tool-call:stream, structured-output, params:stream, parallel-tool-call, parallel-tool-call:stream, tool-call, params, reasoning, reasoning:stream
`google-vertex`	`anthropic/claude-sonnet-4-6@default`	success: parallel-tool-call:stream, tool-call, params, tool-call:stream, structured-output, parallel-tool-call, params:stream, structured-output:stream, reasoning, reasoning:stream
`google-vertex`	`anthropic/claude-sonnet-4@20250514`	skipped: skip-check
`google-vertex`	`deepseek-ai/deepseek-ocr-maas`	skipped: skip-check
`google-vertex`	`deepseek-ai/deepseek-v3-1`	failure: structured-output, reasoning:stream, tool-call, tool-call:stream, params:stream, params, reasoning, structured-output:stream
`google-vertex`	`deepseek-ai/deepseek-v3-2`	failure: tool-call, params:stream, structured-output, structured-output:stream, tool-call:stream, params
`google-vertex`	`gemini-2.5-computer-use-preview-10-2025`	skipped: skip-check
`google-vertex`	`gemini-2.5-flash-image`	skipped: skip-check
`google-vertex`	`gemini-2.5-flash-tts`	success: params
`google-vertex`	`gemini-2.5-pro-tts`	success: params
`google-vertex`	`gemini-3-pro-image-preview`	skipped: skip-check
`google-vertex`	`gemini-3.1-flash-image-preview`	success: params, params:stream, reasoning:stream, reasoning, params:google-genai, params:stream:google-genai, reasoning:stream:google-genai, reasoning:google-genai
`google-vertex`	`gemini-embedding-001`	success: params
`google-vertex`	`gemini-embedding-2-preview`	failure: params
`google-vertex`	`gemini-live-2.5-flash-native-audio`	skipped: skip-check
`google-vertex`	`google/content-moderation`	skipped: skip-check
`google-vertex`	`google/face-detector`	skipped: skip-check
`google-vertex`	`google/gemini-2.5-computer-use-preview-10-2025`	skipped: skip-check
`google-vertex`	`google/gemini-2.5-flash`	success: tool-call, params, params:stream, tool-call:stream, json-output:stream, json-output, structured-output:stream, structured-output, reasoning:stream, reasoning, structured-output:google-genai, params:google-genai, tool-call:stream:google-genai, tool-call:google-genai, params:stream:google-genai, structured-output:stream:google-genai, json-output:google-genai, json-output:stream:google-genai, reasoning:stream:google-genai, reasoning:google-genai
`google-vertex`	`google/gemini-2.5-flash-image`	success: params, params:stream, params:stream:google-genai, params:google-genai
`google-vertex`	`google/gemini-2.5-flash-lite-preview-09-2025`	success: structured-output:stream, tool-call, structured-output, json-output:stream, params, params:stream, json-output, tool-call:stream, reasoning:stream, reasoning, params:google-genai, json-output:stream:google-genai, tool-call:google-genai, tool-call:stream:google-genai, structured-output:stream:google-genai, params:stream:google-genai, structured-output:google-genai, json-output:google-genai, reasoning:stream:google-genai, reasoning:google-genai
`google-vertex`	`google/gemini-3-flash-preview`	success: tool-call, tool-call:stream, json-output:stream, params:stream, json-output, structured-output:stream, structured-output, params:google-genai, params:stream:google-genai, tool-call:stream:google-genai, tool-call:google-genai, json-output:stream:google-genai, structured-output:google-genai, params, json-output:google-genai, structured-output:stream:google-genai, reasoning:stream, reasoning, reasoning:google-genai validation_failure: reasoning:stream:google-genai
`google-vertex`	`google/gemini-3-pro-image-preview`	success: params, params:stream, params:google-genai, params:stream:google-genai, reasoning, reasoning:stream, reasoning:google-genai, reasoning:stream:google-genai
`google-vertex`	`google/gemini-3.1-flash-lite-preview`	success: params, tool-call, json-output, params:stream, tool-call:stream, structured-output, json-output:stream, structured-output:stream, reasoning:stream, reasoning, params:google-genai, structured-output:stream:google-genai, params:stream:google-genai, tool-call:google-genai, tool-call:stream:google-genai, structured-output:google-genai, json-output:stream:google-genai, json-output:google-genai, reasoning:stream:google-genai, reasoning:google-genai
`google-vertex`	`google/gemini-embedding-001`	success: params
`google-vertex`	`google/gemini-embedding-2-preview`	failure: params
`google-vertex`	`google/gemma4`	failure: tool-call:stream, params, structured-output:stream, params:stream, reasoning, reasoning:stream, structured-output, tool-call
`google-vertex`	`google/language-v1-analyze-entity-sentiment`	skipped: skip-check
`google-vertex`	`google/language-v1-analyze-syntax`	skipped: skip-check
`google-vertex`	`google/multimodalembedding`	success: params
`google-vertex`	`google/object-detector`	skipped: skip-check
`google-vertex`	`google/people-blur`	skipped: skip-check
`google-vertex`	`google/ppe-detector`	skipped: skip-check
`google-vertex`	`google/pretrained-form-parser`	skipped: skip-check
`google-vertex`	`google/tag-recognizer`	skipped: skip-check
`google-vertex`	`google/text-detector`	skipped: skip-check
`google-vertex`	`google/text-translation`	skipped: skip-check
`google-vertex`	`google/translate-llm`	failure: params:stream, params
`google-vertex`	`google/video-text-detection`	skipped: skip-check
`google-vertex`	`imagen-3.0-capability-001`	skipped: skip-check
`google-vertex`	`imagen-3.0-generate-001`	skipped: skip-check
`google-vertex`	`imagen-4.0-fast-generate-001`	skipped: skip-check
`google-vertex`	`minimaxai/minimax-m2`	failure: structured-output:stream, tool-call, params, structured-output, tool-call:stream, params:stream
`google-vertex`	`mistralai/codestral-2`	success: params, params:stream
`google-vertex`	`mongodb/voyage-3.5-lite`	skipped: skip-check
`google-vertex`	`mongodb/voyage-4`	skipped: skip-check
`google-vertex`	`moonshotai/kimi-k2-thinking-maas`	success: tool-call, tool-call:stream, params:stream, params, structured-output, structured-output:stream, reasoning:stream, reasoning
`google-vertex`	`openai/gpt-oss`	failure: structured-output:stream, structured-output, reasoning:stream, params:stream, params, reasoning, tool-call, tool-call:stream
`google-vertex`	`openai/gpt-oss-120b-maas`	success: params:stream, tool-call, tool-call:stream, params validation_failure: structured-output:stream, structured-output
`google-vertex`	`qwen/qwen3-coder-480b-a35b-instruct-maas`	success: tool-call:stream, structured-output, params, tool-call, params:stream, structured-output:stream
`google-vertex`	`text-multilingual-embedding-002`	success: params
`google-vertex`	`zai-org/glm-4.7`	failure: tool-call, structured-output:stream, tool-call:stream, structured-output, params:stream, params

Failures (49)

google-vertex/google/gemini-3-flash-preview — reasoning:stream:google-genai (validation_failure)

Error:

Traceback (most recent call last):
  File "/tmp/tmpzoqxdbco/snippet.py", line 70, in <module>
    raise Exception("VALIDATION FAILED: reasoning stream - no thinking information in GenAI stream")
Exception: VALIDATION FAILED: reasoning stream - no thinking information in GenAI stream

Code snippet

from google import genai
from google.genai import types

_endpoint = "https://internal.devtest.truefoundry.tech/api/llm"
_api_key = "***"
_full_model = "test-v2-vertex/google/gemini-3-flash-preview"

_parts = _full_model.split("/")
_provider_account = _parts[0]
_model_id = "/".join(_parts[1:])
if "/" in _model_id:
    _model_id = _model_id.rsplit("/", 1)[-1]

_base_url = f"{_endpoint}/gemini/{_provider_account}/proxy"

client = genai.Client(
    api_key=_api_key,
    http_options=types.HttpOptions(base_url=_base_url),
)

contents = [
    types.Content(role="user", parts=[types.Part.from_text(text="Hi")]),
    types.Content(role="model", parts=[types.Part.from_text(text="Hi, how can I help you")]),
    types.Content(role="user", parts=[types.Part.from_text(text="How to calculate 3^3^3^3? Think step by step and show all reasoning.")]),
]

config = types.GenerateContentConfig(
    system_instruction="You are a helpful assistant. You MUST think step by step and show your reasoning. Never skip reasoning steps.",
    thinking_config=types.ThinkingConfig(
        include_thoughts=True,
        thinking_budget=5000,
    ),
)

_chunks = []
for chunk in client.models.generate_content_stream(
    model=_model_id,
    contents=contents,
    config=config,
):
    _chunks.append(chunk)
    if chunk.candidates and chunk.candidates[0].content and chunk.candidates[0].content.parts:
        for part in chunk.candidates[0].content.parts:
            if not part.text:
                continue
            if part.thought:
                print(f"[Thinking] {part.text}", end="", flush=True)
            else:
                print(part.text, end="", flush=True)

_thought_detected = False
for _chunk in _chunks:
    if not _chunk.candidates or not _chunk.candidates[0].content:
        continue
    for _part in _chunk.candidates[0].content.parts:
        if not _part.text:
            continue
        if _part.thought:
            _thought_detected = True
            print(_part.text, end="", flush=True)
        else:
            print(_part.text, end="", flush=True)

if not _thought_detected:
    _usage = getattr(_chunks[-1], "usage_metadata", None) if _chunks else None
    if _usage and getattr(_usage, "thoughts_token_count", 0):
        _thought_detected = True

if not _thought_detected:
    raise Exception("VALIDATION FAILED: reasoning stream - no thinking information in GenAI stream")
print("\nVALIDATION: reasoning stream SUCCESS")

google-vertex/deepseek-ai/deepseek-v3-1 — structured-output (failure)

Error:

Traceback (most recent call last):
  File "/tmp/tmpj3bof7yd/snippet.py", line 21, in <module>
    response = client.chat.completions.create(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_utils/_utils.py", line 286, in wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/resources/chat/completions/completions.py", line 1147, in create
    return self._post(
           ^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 1259, in post
    return cast(ResponseT, self.request(cast_to, opts, stream=stream, stream_cls=stream_cls))
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 1047, in request
    raise self._make_status_error_from_response(err.response) from None
openai.NotFoundError: Error code: 404 - {'status': 'failure', 'message': 'vertex error: Publisher Model `projects/248190060486/locations/us-central1/publishers/deepseek-ai/models/deepseek-v3-1` was not found or your project does not have access to it. Please ensure you are using a valid model version. For more information, see: https://cloud.google.com/vertex-ai/generative-ai/docs/learn/model-versions', 'error': {'message': 'vertex error: Publisher Model `projects/248190060486/locations/us-central1/publishers/deepseek-ai/models/deepseek-v3-1` was not found or your project does not have access to it. Please ensure you are using a valid model version. For more information, see: https://cloud.google.com/vertex-ai/generative-ai/docs/learn/model-versions', 'type': 'APIError', 'code': '404'}, 'error_origin_level': 'api_error', 'provider': 'google-vertex'}

Code snippet

from openai import OpenAI
import json

client = OpenAI(api_key="***", base_url="https://internal.devtest.truefoundry.tech/api/llm")

response_schema = json.loads('''{
  "title": "CalendarEvent",
  "type": "object",
  "properties": {
    "name": { "type": "string" },
    "date": { "type": "string" },
    "participants": {
      "type": "array",
      "items": { "type": "string" }
    }
  },
  "required": ["name", "date", "participants"],
  "additionalProperties": false
}''')

response = client.chat.completions.create(
    model="test-v2-vertex/deepseek-ai-deepseek-v3-1",
    messages=[
        {"role": "system", "content": "Extract the event information as JSON."},
        {"role": "user", "content": "Hi"},
        {"role": "assistant", "content": "Hi, how can I help you"},
        {"role": "user", "content": "Alice and Bob are going to a science fair on Friday. Extract the event details as JSON."},
    ],
    response_format={"type": "json_schema", "json_schema": {"name": "CalendarEvent", "schema": response_schema}},
    stream=False,
)

import json as _json

_content = response.choices[0].message.content
print(_content)

if not _content:
    raise Exception("VALIDATION FAILED: structured-output - response content is empty")

_parsed = _json.loads(_content)

if "name" not in _parsed or "date" not in _parsed or "participants" not in _parsed:
    raise Exception("VALIDATION FAILED: structured-output - missing expected fields (name, date, participants)")

if not isinstance(_parsed.get("participants"), list):
    raise Exception("VALIDATION FAILED: structured-output - 'participants' is not a list, schema not enforced")

if set(_parsed.keys()) != {"name", "date", "participants"}:
    raise Exception(
        f"VALIDATION FAILED: structured-output - unexpected keys present: {set(_parsed.keys())}"
    )

print("VALIDATION: structured-output SUCCESS")

google-vertex/deepseek-ai/deepseek-v3-1 — reasoning:stream (failure)

Error:

Traceback (most recent call last):
  File "/tmp/tmpdvh0zgha/snippet.py", line 5, in <module>
    response = client.chat.completions.create(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_utils/_utils.py", line 286, in wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/resources/chat/completions/completions.py", line 1147, in create
    return self._post(
           ^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 1259, in post
    return cast(ResponseT, self.request(cast_to, opts, stream=stream, stream_cls=stream_cls))
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 1047, in request
    raise self._make_status_error_from_response(err.response) from None
openai.NotFoundError: Error code: 404 - {'status': 'failure', 'message': 'vertex error: Publisher Model `projects/248190060486/locations/us-central1/publishers/deepseek-ai/models/deepseek-v3-1` was not found or your project does not have access to it. Please ensure you are using a valid model version. For more information, see: https://cloud.google.com/vertex-ai/generative-ai/docs/learn/model-versions', 'error': {'message': 'vertex error: Publisher Model `projects/248190060486/locations/us-central1/publishers/deepseek-ai/models/deepseek-v3-1` was not found or your project does not have access to it. Please ensure you are using a valid model version. For more information, see: https://cloud.google.com/vertex-ai/generative-ai/docs/learn/model-versions', 'type': 'APIError', 'code': '404'}, 'error_origin_level': 'api_error', 'provider': 'google-vertex'}

Code snippet

from openai import OpenAI

client = OpenAI(api_key="***", base_url="https://internal.devtest.truefoundry.tech/api/llm")

response = client.chat.completions.create(
    model="test-v2-vertex/deepseek-ai-deepseek-v3-1",
    messages=[
        {"role": "system", "content": "You are a helpful assistant. You MUST think step by step and show your reasoning. Never skip reasoning steps."},
        {"role": "user", "content": "Hi"},
        {"role": "assistant", "content": "Hi, how can I help you"},
        {"role": "user", "content": "How to calculate 3^3^3^3? Think step by step and show all reasoning."},
    ],
    reasoning_effort="medium",
    stream=True,
)

_reasoning_detected = False
for chunk in response:
    if chunk.choices and len(chunk.choices) > 0:
        delta = chunk.choices[0].delta
        if delta.content is not None:
            print(delta.content, end="", flush=True)
        if getattr(delta, "reasoning_content", None) is not None:
            _reasoning_detected = True
        if getattr(delta, "reasoning", None) is not None:
            _reasoning_detected = True

    _usage = getattr(chunk, "usage", None)
    if _usage is not None:
        _details = getattr(_usage, "completion_tokens_details", None)
        if _details and getattr(_details, "reasoning_tokens", 0) > 0:
            _reasoning_detected = True

if not _reasoning_detected:
    raise Exception("VALIDATION FAILED: reasoning stream - no reasoning information in stream")
print("\nVALIDATION: reasoning stream SUCCESS")

google-vertex/deepseek-ai/deepseek-v3-1 — tool-call (failure)

Error:

Traceback (most recent call last):
  File "/tmp/tmpqkqiohzt/snippet.py", line 27, in <module>
    response = client.chat.completions.create(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_utils/_utils.py", line 286, in wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/resources/chat/completions/completions.py", line 1147, in create
    return self._post(
           ^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 1259, in post
    return cast(ResponseT, self.request(cast_to, opts, stream=stream, stream_cls=stream_cls))
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 1047, in request
    raise self._make_status_error_from_response(err.response) from None
openai.NotFoundError: Error code: 404 - {'status': 'failure', 'message': 'vertex error: Publisher Model `projects/248190060486/locations/us-central1/publishers/deepseek-ai/models/deepseek-v3-1` was not found or your project does not have access to it. Please ensure you are using a valid model version. For more information, see: https://cloud.google.com/vertex-ai/generative-ai/docs/learn/model-versions', 'error': {'message': 'vertex error: Publisher Model `projects/248190060486/locations/us-central1/publishers/deepseek-ai/models/deepseek-v3-1` was not found or your project does not have access to it. Please ensure you are using a valid model version. For more information, see: https://cloud.google.com/vertex-ai/generative-ai/docs/learn/model-versions', 'type': 'APIError', 'code': '404'}, 'error_origin_level': 'api_error', 'provider': 'google-vertex'}

Code snippet

from openai import OpenAI

client = OpenAI(api_key="***", base_url="https://internal.devtest.truefoundry.tech/api/llm")

tools = [
    {
        "type": "function",
        "function": {
            "name": "get_weather",
            "description": "Get the current weather for a location.",
            "parameters": {
                "type": "object",
                "properties": {
                    "location": {
                        "type": "string",
                        "description": "The city name, e.g. London",
                    },
                },
                "required": ["location"],
                "additionalProperties": False,
            },
            "strict": True,
        },
    },
]

response = client.chat.completions.create(
    model="test-v2-vertex/deepseek-ai-deepseek-v3-1",
    messages=[
        {"role": "system", "content": "You are a helpful assistant with access to tools. You MUST strictly use the provided tools to answer. Never respond with plain text when a tool is available."},
        {"role": "user", "content": "Hi"},
        {"role": "assistant", "content": "Hi, how can I help you"},
        {"role": "user", "content": "Use the get_weather tool to check the weather in London. You must call the tool, do not respond with plain text."},
    ],
    tools=tools,
    tool_choice="auto",
    stream=False,
)

_message = response.choices[0].message
if _message.tool_calls:
    for _tc in _message.tool_calls:
        print(f"Function: {_tc.function.name}")
        print(f"Arguments: {_tc.function.arguments}")
else:
    print(_message.content)

if not _message.tool_calls or len(_message.tool_calls) == 0:
    raise Exception("VALIDATION FAILED: tool-call - no tool calls in response")
print("VALIDATION: tool-call SUCCESS")

google-vertex/deepseek-ai/deepseek-v3-1 — tool-call:stream (failure)

Error:

Traceback (most recent call last):
  File "/tmp/tmp_6fit0dm/snippet.py", line 27, in <module>
    response = client.chat.completions.create(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_utils/_utils.py", line 286, in wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/resources/chat/completions/completions.py", line 1147, in create
    return self._post(
           ^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 1259, in post
    return cast(ResponseT, self.request(cast_to, opts, stream=stream, stream_cls=stream_cls))
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 1047, in request
    raise self._make_status_error_from_response(err.response) from None
openai.NotFoundError: Error code: 404 - {'status': 'failure', 'message': 'vertex error: Publisher Model `projects/248190060486/locations/us-central1/publishers/deepseek-ai/models/deepseek-v3-1` was not found or your project does not have access to it. Please ensure you are using a valid model version. For more information, see: https://cloud.google.com/vertex-ai/generative-ai/docs/learn/model-versions', 'error': {'message': 'vertex error: Publisher Model `projects/248190060486/locations/us-central1/publishers/deepseek-ai/models/deepseek-v3-1` was not found or your project does not have access to it. Please ensure you are using a valid model version. For more information, see: https://cloud.google.com/vertex-ai/generative-ai/docs/learn/model-versions', 'type': 'APIError', 'code': '404'}, 'error_origin_level': 'api_error', 'provider': 'google-vertex'}

Code snippet

from openai import OpenAI

client = OpenAI(api_key="***", base_url="https://internal.devtest.truefoundry.tech/api/llm")

tools = [
    {
        "type": "function",
        "function": {
            "name": "get_weather",
            "description": "Get the current weather for a location.",
            "parameters": {
                "type": "object",
                "properties": {
                    "location": {
                        "type": "string",
                        "description": "The city name, e.g. London",
                    },
                },
                "required": ["location"],
                "additionalProperties": False,
            },
            "strict": True,
        },
    },
]

response = client.chat.completions.create(
    model="test-v2-vertex/deepseek-ai-deepseek-v3-1",
    messages=[
        {"role": "system", "content": "You are a helpful assistant with access to tools. You MUST strictly use the provided tools to answer. Never respond with plain text when a tool is available."},
        {"role": "user", "content": "Hi"},
        {"role": "assistant", "content": "Hi, how can I help you"},
        {"role": "user", "content": "Use the get_weather tool to check the weather in London. You must call the tool, do not respond with plain text."},
    ],
    tools=tools,
    tool_choice="auto",
    stream=True,
)

_tool_calls_made = False
for chunk in response:
    if chunk.choices and len(chunk.choices) > 0:
        delta = chunk.choices[0].delta
        if delta.content is not None:
            print(delta.content, end="", flush=True)
        if delta.tool_calls:
            _tool_calls_made = True
            for _tc in delta.tool_calls:
                if _tc.function:
                    print(_tc.function.arguments or "", end="", flush=True)

if not _tool_calls_made:
    raise Exception("VALIDATION FAILED: tool-call stream - no tool calls received")
print("\nVALIDATION: tool-call stream SUCCESS")

google-vertex/deepseek-ai/deepseek-v3-1 — params:stream (failure)

Error:

Traceback (most recent call last):
  File "/tmp/tmpf21rlwe5/snippet.py", line 5, in <module>
    response = client.chat.completions.create(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_utils/_utils.py", line 286, in wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/resources/chat/completions/completions.py", line 1147, in create
    return self._post(
           ^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 1259, in post
    return cast(ResponseT, self.request(cast_to, opts, stream=stream, stream_cls=stream_cls))
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 1047, in request
    raise self._make_status_error_from_response(err.response) from None
openai.NotFoundError: Error code: 404 - {'status': 'failure', 'message': 'vertex error: Publisher Model `projects/248190060486/locations/us-central1/publishers/deepseek-ai/models/deepseek-v3-1` was not found or your project does not have access to it. Please ensure you are using a valid model version. For more information, see: https://cloud.google.com/vertex-ai/generative-ai/docs/learn/model-versions', 'error': {'message': 'vertex error: Publisher Model `projects/248190060486/locations/us-central1/publishers/deepseek-ai/models/deepseek-v3-1` was not found or your project does not have access to it. Please ensure you are using a valid model version. For more information, see: https://cloud.google.com/vertex-ai/generative-ai/docs/learn/model-versions', 'type': 'APIError', 'code': '404'}, 'error_origin_level': 'api_error', 'provider': 'google-vertex'}

Code snippet

from openai import OpenAI

client = OpenAI(api_key="***", base_url="https://internal.devtest.truefoundry.tech/api/llm")

response = client.chat.completions.create(
    model="test-v2-vertex/deepseek-ai-deepseek-v3-1",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Hi"},
        {"role": "assistant", "content": "Hi, how can I help you"},
        {"role": "user", "content": "What is the capital of France?"},
    ],
    max_tokens=256,
    temperature=0.7,
    stream=True,
)

for chunk in response:
    if chunk.choices and len(chunk.choices) > 0:
        delta = chunk.choices[0].delta
        if delta.content is not None:
            print(delta.content, end="", flush=True)

google-vertex/deepseek-ai/deepseek-v3-1 — params (failure)

Error:

Traceback (most recent call last):
  File "/tmp/tmp6xkim9i6/snippet.py", line 5, in <module>
    response = client.chat.completions.create(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_utils/_utils.py", line 286, in wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/resources/chat/completions/completions.py", line 1147, in create
    return self._post(
           ^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 1259, in post
    return cast(ResponseT, self.request(cast_to, opts, stream=stream, stream_cls=stream_cls))
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 1047, in request
    raise self._make_status_error_from_response(err.response) from None
openai.NotFoundError: Error code: 404 - {'status': 'failure', 'message': 'vertex error: Publisher Model `projects/248190060486/locations/us-central1/publishers/deepseek-ai/models/deepseek-v3-1` was not found or your project does not have access to it. Please ensure you are using a valid model version. For more information, see: https://cloud.google.com/vertex-ai/generative-ai/docs/learn/model-versions', 'error': {'message': 'vertex error: Publisher Model `projects/248190060486/locations/us-central1/publishers/deepseek-ai/models/deepseek-v3-1` was not found or your project does not have access to it. Please ensure you are using a valid model version. For more information, see: https://cloud.google.com/vertex-ai/generative-ai/docs/learn/model-versions', 'type': 'APIError', 'code': '404'}, 'error_origin_level': 'api_error', 'provider': 'google-vertex'}

Code snippet

from openai import OpenAI

client = OpenAI(api_key="***", base_url="https://internal.devtest.truefoundry.tech/api/llm")

response = client.chat.completions.create(
    model="test-v2-vertex/deepseek-ai-deepseek-v3-1",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Hi"},
        {"role": "assistant", "content": "Hi, how can I help you"},
        {"role": "user", "content": "What is the capital of France?"},
    ],
    max_tokens=256,
    temperature=0.7,
    stream=False,
)

print(response.choices[0].message.content)

google-vertex/deepseek-ai/deepseek-v3-1 — reasoning (failure)

Error:

Traceback (most recent call last):
  File "/tmp/tmpoulh2ouh/snippet.py", line 5, in <module>
    response = client.chat.completions.create(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_utils/_utils.py", line 286, in wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/resources/chat/completions/completions.py", line 1147, in create
    return self._post(
           ^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 1259, in post
    return cast(ResponseT, self.request(cast_to, opts, stream=stream, stream_cls=stream_cls))
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 1047, in request
    raise self._make_status_error_from_response(err.response) from None
openai.NotFoundError: Error code: 404 - {'status': 'failure', 'message': 'vertex error: Publisher Model `projects/248190060486/locations/us-central1/publishers/deepseek-ai/models/deepseek-v3-1` was not found or your project does not have access to it. Please ensure you are using a valid model version. For more information, see: https://cloud.google.com/vertex-ai/generative-ai/docs/learn/model-versions', 'error': {'message': 'vertex error: Publisher Model `projects/248190060486/locations/us-central1/publishers/deepseek-ai/models/deepseek-v3-1` was not found or your project does not have access to it. Please ensure you are using a valid model version. For more information, see: https://cloud.google.com/vertex-ai/generative-ai/docs/learn/model-versions', 'type': 'APIError', 'code': '404'}, 'error_origin_level': 'api_error', 'provider': 'google-vertex'}

Code snippet

from openai import OpenAI

client = OpenAI(api_key="***", base_url="https://internal.devtest.truefoundry.tech/api/llm")

response = client.chat.completions.create(
    model="test-v2-vertex/deepseek-ai-deepseek-v3-1",
    messages=[
        {"role": "system", "content": "You are a helpful assistant. You MUST think step by step and show your reasoning. Never skip reasoning steps."},
        {"role": "user", "content": "Hi"},
        {"role": "assistant", "content": "Hi, how can I help you"},
        {"role": "user", "content": "How to calculate 3^3^3^3? Think step by step and show all reasoning."},
    ],
    reasoning_effort="medium",
    stream=False,
)

_usage = getattr(response, "usage", None)
_reasoning_detected = False

_choices = getattr(response, "choices", None)
if _choices and len(_choices) > 0:
    _message = getattr(_choices[0], "message", None)
else:
    _message = None

if _message and getattr(_message, "content", None) is not None:
    print(_message.content)

if _usage is not None:
    _output_token_details = getattr(_usage, "completion_tokens_details", None)
    if _output_token_details and getattr(_output_token_details, "reasoning_tokens", 0) > 0:
        _reasoning_detected = True
    elif getattr(_usage, "reasoning", None) is not None:
        _reasoning_detected = True

if getattr(_message, "reasoning_content", None) is not None:
    _reasoning_detected = True
elif getattr(_message, "reasoning", None) is not None:
    _reasoning_detected = True

if not _reasoning_detected:
    print("Response: ", response)
    raise Exception("VALIDATION FAILED: reasoning - no reasoning information in response")
print("VALIDATION: reasoning SUCCESS")

google-vertex/deepseek-ai/deepseek-v3-1 — structured-output:stream (failure)

Error:

Traceback (most recent call last):
  File "/tmp/tmp073t17_b/snippet.py", line 21, in <module>
    response = client.chat.completions.create(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_utils/_utils.py", line 286, in wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/resources/chat/completions/completions.py", line 1147, in create
    return self._post(
           ^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 1259, in post
    return cast(ResponseT, self.request(cast_to, opts, stream=stream, stream_cls=stream_cls))
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 1047, in request
    raise self._make_status_error_from_response(err.response) from None
openai.NotFoundError: Error code: 404 - {'status': 'failure', 'message': 'vertex error: Publisher Model `projects/248190060486/locations/us-central1/publishers/deepseek-ai/models/deepseek-v3-1` was not found or your project does not have access to it. Please ensure you are using a valid model version. For more information, see: https://cloud.google.com/vertex-ai/generative-ai/docs/learn/model-versions', 'error': {'message': 'vertex error: Publisher Model `projects/248190060486/locations/us-central1/publishers/deepseek-ai/models/deepseek-v3-1` was not found or your project does not have access to it. Please ensure you are using a valid model version. For more information, see: https://cloud.google.com/vertex-ai/generative-ai/docs/learn/model-versions', 'type': 'APIError', 'code': '404'}, 'error_origin_level': 'api_error', 'provider': 'google-vertex'}

Code snippet

from openai import OpenAI
import json

client = OpenAI(api_key="***", base_url="https://internal.devtest.truefoundry.tech/api/llm")

response_schema = json.loads('''{
  "title": "CalendarEvent",
  "type": "object",
  "properties": {
    "name": { "type": "string" },
    "date": { "type": "string" },
    "participants": {
      "type": "array",
      "items": { "type": "string" }
    }
  },
  "required": ["name", "date", "participants"],
  "additionalProperties": false
}''')

response = client.chat.completions.create(
    model="test-v2-vertex/deepseek-ai-deepseek-v3-1",
    messages=[
        {"role": "system", "content": "Extract the event information as JSON."},
        {"role": "user", "content": "Hi"},
        {"role": "assistant", "content": "Hi, how can I help you"},
        {"role": "user", "content": "Alice and Bob are going to a science fair on Friday. Extract the event details as JSON."},
    ],
    response_format={"type": "json_schema", "json_schema": {"name": "CalendarEvent", "schema": response_schema}},
    stream=True,
)

import json as _json

_accumulated = ""
for chunk in response:
    if chunk.choices and len(chunk.choices) > 0:
        delta = chunk.choices[0].delta
        if delta.content is not None:
            _accumulated += delta.content
            print(delta.content, end="", flush=True)

if not _accumulated:
    raise Exception("VALIDATION FAILED: structured-output stream - no content received")

_parsed = _json.loads(_accumulated)

if "name" not in _parsed or "date" not in _parsed or "participants" not in _parsed:
    raise Exception("VALIDATION FAILED: structured-output stream - missing expected fields (name, date, participants)")

if not isinstance(_parsed.get("participants"), list):
    raise Exception("VALIDATION FAILED: structured-output stream - 'participants' is not a list, schema not enforced")

if set(_parsed.keys()) != {"name", "date", "participants"}:
    raise Exception(
        f"VALIDATION FAILED: structured-output stream - unexpected keys present: {set(_parsed.keys())}"
    )

print("\nVALIDATION: structured-output stream SUCCESS")

google-vertex/openai/gpt-oss-120b-maas — structured-output:stream (validation_failure)

Error:

Traceback (most recent call last):
  File "/tmp/tmp0mbgl4hq/snippet.py", line 49, in <module>
    raise Exception("VALIDATION FAILED: structured-output stream - missing expected fields (name, date, participants)")
Exception: VALIDATION FAILED: structured-output stream - missing expected fields (name, date, participants)

Code snippet

from openai import OpenAI
import json

client = OpenAI(api_key="***", base_url="https://internal.devtest.truefoundry.tech/api/llm")

response_schema = json.loads('''{
  "title": "CalendarEvent",
  "type": "object",
  "properties": {
    "name": { "type": "string" },
    "date": { "type": "string" },
    "participants": {
      "type": "array",
      "items": { "type": "string" }
    }
  },
  "required": ["name", "date", "participants"],
  "additionalProperties": false
}''')

response = client.chat.completions.create(
    model="test-v2-vertex/openai-gpt-oss-120b-maas",
    messages=[
        {"role": "system", "content": "Extract the event information as JSON."},
        {"role": "user", "content": "Hi"},
        {"role": "assistant", "content": "Hi, how can I help you"},
        {"role": "user", "content": "Alice and Bob are going to a science fair on Friday. Extract the event details as JSON."},
    ],
    response_format={"type": "json_schema", "json_schema": {"name": "CalendarEvent", "schema": response_schema}},
    stream=True,
)

import json as _json

_accumulated = ""
for chunk in response:
    if chunk.choices and len(chunk.choices) > 0:
        delta = chunk.choices[0].delta
        if delta.content is not None:
            _accumulated += delta.content
            print(delta.content, end="", flush=True)

if not _accumulated:
    raise Exception("VALIDATION FAILED: structured-output stream - no content received")

_parsed = _json.loads(_accumulated)

if "name" not in _parsed or "date" not in _parsed or "participants" not in _parsed:
    raise Exception("VALIDATION FAILED: structured-output stream - missing expected fields (name, date, participants)")

if not isinstance(_parsed.get("participants"), list):
    raise Exception("VALIDATION FAILED: structured-output stream - 'participants' is not a list, schema not enforced")

if set(_parsed.keys()) != {"name", "date", "participants"}:
    raise Exception(
        f"VALIDATION FAILED: structured-output stream - unexpected keys present: {set(_parsed.keys())}"
    )

print("\nVALIDATION: structured-output stream SUCCESS")

google-vertex/openai/gpt-oss-120b-maas — structured-output (validation_failure)

Error:

Traceback (most recent call last):
  File "/tmp/tmpg218o8rk/snippet.py", line 44, in <module>
    raise Exception("VALIDATION FAILED: structured-output - missing expected fields (name, date, participants)")
Exception: VALIDATION FAILED: structured-output - missing expected fields (name, date, participants)

Code snippet

from openai import OpenAI
import json

client = OpenAI(api_key="***", base_url="https://internal.devtest.truefoundry.tech/api/llm")

response_schema = json.loads('''{
  "title": "CalendarEvent",
  "type": "object",
  "properties": {
    "name": { "type": "string" },
    "date": { "type": "string" },
    "participants": {
      "type": "array",
      "items": { "type": "string" }
    }
  },
  "required": ["name", "date", "participants"],
  "additionalProperties": false
}''')

response = client.chat.completions.create(
    model="test-v2-vertex/openai-gpt-oss-120b-maas",
    messages=[
        {"role": "system", "content": "Extract the event information as JSON."},
        {"role": "user", "content": "Hi"},
        {"role": "assistant", "content": "Hi, how can I help you"},
        {"role": "user", "content": "Alice and Bob are going to a science fair on Friday. Extract the event details as JSON."},
    ],
    response_format={"type": "json_schema", "json_schema": {"name": "CalendarEvent", "schema": response_schema}},
    stream=False,
)

import json as _json

_content = response.choices[0].message.content
print(_content)

if not _content:
    raise Exception("VALIDATION FAILED: structured-output - response content is empty")

_parsed = _json.loads(_content)

if "name" not in _parsed or "date" not in _parsed or "participants" not in _parsed:
    raise Exception("VALIDATION FAILED: structured-output - missing expected fields (name, date, participants)")

if not isinstance(_parsed.get("participants"), list):
    raise Exception("VALIDATION FAILED: structured-output - 'participants' is not a list, schema not enforced")

if set(_parsed.keys()) != {"name", "date", "participants"}:
    raise Exception(
        f"VALIDATION FAILED: structured-output - unexpected keys present: {set(_parsed.keys())}"
    )

print("VALIDATION: structured-output SUCCESS")

google-vertex/deepseek-ai/deepseek-v3-2 — tool-call (failure)

Error:

Traceback (most recent call last):
  File "/tmp/tmp3ij9qq21/snippet.py", line 27, in <module>
    response = client.chat.completions.create(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_utils/_utils.py", line 286, in wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/resources/chat/completions/completions.py", line 1147, in create
    return self._post(
           ^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 1259, in post
    return cast(ResponseT, self.request(cast_to, opts, stream=stream, stream_cls=stream_cls))
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 1047, in request
    raise self._make_status_error_from_response(err.response) from None
openai.NotFoundError: Error code: 404 - {'status': 'failure', 'message': 'vertex error: Publisher Model `projects/248190060486/locations/global/publishers/deepseek-ai/models/deepseek-v3-2` was not found or your project does not have access to it. Please ensure you are using a valid model version. For more information, see: https://cloud.google.com/vertex-ai/generative-ai/docs/learn/model-versions', 'error': {'message': 'vertex error: Publisher Model `projects/248190060486/locations/global/publishers/deepseek-ai/models/deepseek-v3-2` was not found or your project does not have access to it. Please ensure you are using a valid model version. For more information, see: https://cloud.google.com/vertex-ai/generative-ai/docs/learn/model-versions', 'type': 'APIError', 'code': '404'}, 'error_origin_level': 'api_error', 'provider': 'google-vertex'}

Code snippet

from openai import OpenAI

client = OpenAI(api_key="***", base_url="https://internal.devtest.truefoundry.tech/api/llm")

tools = [
    {
        "type": "function",
        "function": {
            "name": "get_weather",
            "description": "Get the current weather for a location.",
            "parameters": {
                "type": "object",
                "properties": {
                    "location": {
                        "type": "string",
                        "description": "The city name, e.g. London",
                    },
                },
                "required": ["location"],
                "additionalProperties": False,
            },
            "strict": True,
        },
    },
]

response = client.chat.completions.create(
    model="test-v2-vertex/deepseek-ai-deepseek-v3-2",
    messages=[
        {"role": "system", "content": "You are a helpful assistant with access to tools. You MUST strictly use the provided tools to answer. Never respond with plain text when a tool is available."},
        {"role": "user", "content": "Hi"},
        {"role": "assistant", "content": "Hi, how can I help you"},
        {"role": "user", "content": "Use the get_weather tool to check the weather in London. You must call the tool, do not respond with plain text."},
    ],
    tools=tools,
    tool_choice="auto",
    stream=False,
)

_message = response.choices[0].message
if _message.tool_calls:
    for _tc in _message.tool_calls:
        print(f"Function: {_tc.function.name}")
        print(f"Arguments: {_tc.function.arguments}")
else:
    print(_message.content)

if not _message.tool_calls or len(_message.tool_calls) == 0:
    raise Exception("VALIDATION FAILED: tool-call - no tool calls in response")
print("VALIDATION: tool-call SUCCESS")

google-vertex/deepseek-ai/deepseek-v3-2 — params:stream (failure)

Error:

Traceback (most recent call last):
  File "/tmp/tmprzf_ucll/snippet.py", line 5, in <module>
    response = client.chat.completions.create(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_utils/_utils.py", line 286, in wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/resources/chat/completions/completions.py", line 1147, in create
    return self._post(
           ^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 1259, in post
    return cast(ResponseT, self.request(cast_to, opts, stream=stream, stream_cls=stream_cls))
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 1047, in request
    raise self._make_status_error_from_response(err.response) from None
openai.NotFoundError: Error code: 404 - {'status': 'failure', 'message': 'vertex error: Publisher Model `projects/248190060486/locations/global/publishers/deepseek-ai/models/deepseek-v3-2` was not found or your project does not have access to it. Please ensure you are using a valid model version. For more information, see: https://cloud.google.com/vertex-ai/generative-ai/docs/learn/model-versions', 'error': {'message': 'vertex error: Publisher Model `projects/248190060486/locations/global/publishers/deepseek-ai/models/deepseek-v3-2` was not found or your project does not have access to it. Please ensure you are using a valid model version. For more information, see: https://cloud.google.com/vertex-ai/generative-ai/docs/learn/model-versions', 'type': 'APIError', 'code': '404'}, 'error_origin_level': 'api_error', 'provider': 'google-vertex'}

Code snippet

from openai import OpenAI

client = OpenAI(api_key="***", base_url="https://internal.devtest.truefoundry.tech/api/llm")

response = client.chat.completions.create(
    model="test-v2-vertex/deepseek-ai-deepseek-v3-2",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Hi"},
        {"role": "assistant", "content": "Hi, how can I help you"},
        {"role": "user", "content": "What is the capital of France?"},
    ],
    max_tokens=256,
    temperature=0.7,
    stream=True,
)

for chunk in response:
    if chunk.choices and len(chunk.choices) > 0:
        delta = chunk.choices[0].delta
        if delta.content is not None:
            print(delta.content, end="", flush=True)

google-vertex/deepseek-ai/deepseek-v3-2 — structured-output (failure)

Error:

Traceback (most recent call last):
  File "/tmp/tmppcezmo7l/snippet.py", line 21, in <module>
    response = client.chat.completions.create(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_utils/_utils.py", line 286, in wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/resources/chat/completions/completions.py", line 1147, in create
    return self._post(
           ^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 1259, in post
    return cast(ResponseT, self.request(cast_to, opts, stream=stream, stream_cls=stream_cls))
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 1047, in request
    raise self._make_status_error_from_response(err.response) from None
openai.NotFoundError: Error code: 404 - {'status': 'failure', 'message': 'vertex error: Publisher Model `projects/248190060486/locations/global/publishers/deepseek-ai/models/deepseek-v3-2` was not found or your project does not have access to it. Please ensure you are using a valid model version. For more information, see: https://cloud.google.com/vertex-ai/generative-ai/docs/learn/model-versions', 'error': {'message': 'vertex error: Publisher Model `projects/248190060486/locations/global/publishers/deepseek-ai/models/deepseek-v3-2` was not found or your project does not have access to it. Please ensure you are using a valid model version. For more information, see: https://cloud.google.com/vertex-ai/generative-ai/docs/learn/model-versions', 'type': 'APIError', 'code': '404'}, 'error_origin_level': 'api_error', 'provider': 'google-vertex'}

Code snippet

from openai import OpenAI
import json

client = OpenAI(api_key="***", base_url="https://internal.devtest.truefoundry.tech/api/llm")

response_schema = json.loads('''{
  "title": "CalendarEvent",
  "type": "object",
  "properties": {
    "name": { "type": "string" },
    "date": { "type": "string" },
    "participants": {
      "type": "array",
      "items": { "type": "string" }
    }
  },
  "required": ["name", "date", "participants"],
  "additionalProperties": false
}''')

response = client.chat.completions.create(
    model="test-v2-vertex/deepseek-ai-deepseek-v3-2",
    messages=[
        {"role": "system", "content": "Extract the event information as JSON."},
        {"role": "user", "content": "Hi"},
        {"role": "assistant", "content": "Hi, how can I help you"},
        {"role": "user", "content": "Alice and Bob are going to a science fair on Friday. Extract the event details as JSON."},
    ],
    response_format={"type": "json_schema", "json_schema": {"name": "CalendarEvent", "schema": response_schema}},
    stream=False,
)

import json as _json

_content = response.choices[0].message.content
print(_content)

if not _content:
    raise Exception("VALIDATION FAILED: structured-output - response content is empty")

_parsed = _json.loads(_content)

if "name" not in _parsed or "date" not in _parsed or "participants" not in _parsed:
    raise Exception("VALIDATION FAILED: structured-output - missing expected fields (name, date, participants)")

if not isinstance(_parsed.get("participants"), list):
    raise Exception("VALIDATION FAILED: structured-output - 'participants' is not a list, schema not enforced")

if set(_parsed.keys()) != {"name", "date", "participants"}:
    raise Exception(
        f"VALIDATION FAILED: structured-output - unexpected keys present: {set(_parsed.keys())}"
    )

print("VALIDATION: structured-output SUCCESS")

google-vertex/deepseek-ai/deepseek-v3-2 — structured-output:stream (failure)

Error:

Traceback (most recent call last):
  File "/tmp/tmpe1vt6kxc/snippet.py", line 21, in <module>
    response = client.chat.completions.create(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_utils/_utils.py", line 286, in wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/resources/chat/completions/completions.py", line 1147, in create
    return self._post(
           ^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 1259, in post
    return cast(ResponseT, self.request(cast_to, opts, stream=stream, stream_cls=stream_cls))
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 1047, in request
    raise self._make_status_error_from_response(err.response) from None
openai.NotFoundError: Error code: 404 - {'status': 'failure', 'message': 'vertex error: Publisher Model `projects/248190060486/locations/global/publishers/deepseek-ai/models/deepseek-v3-2` was not found or your project does not have access to it. Please ensure you are using a valid model version. For more information, see: https://cloud.google.com/vertex-ai/generative-ai/docs/learn/model-versions', 'error': {'message': 'vertex error: Publisher Model `projects/248190060486/locations/global/publishers/deepseek-ai/models/deepseek-v3-2` was not found or your project does not have access to it. Please ensure you are using a valid model version. For more information, see: https://cloud.google.com/vertex-ai/generative-ai/docs/learn/model-versions', 'type': 'APIError', 'code': '404'}, 'error_origin_level': 'api_error', 'provider': 'google-vertex'}

Code snippet

from openai import OpenAI
import json

client = OpenAI(api_key="***", base_url="https://internal.devtest.truefoundry.tech/api/llm")

response_schema = json.loads('''{
  "title": "CalendarEvent",
  "type": "object",
  "properties": {
    "name": { "type": "string" },
    "date": { "type": "string" },
    "participants": {
      "type": "array",
      "items": { "type": "string" }
    }
  },
  "required": ["name", "date", "participants"],
  "additionalProperties": false
}''')

response = client.chat.completions.create(
    model="test-v2-vertex/deepseek-ai-deepseek-v3-2",
    messages=[
        {"role": "system", "content": "Extract the event information as JSON."},
        {"role": "user", "content": "Hi"},
        {"role": "assistant", "content": "Hi, how can I help you"},
        {"role": "user", "content": "Alice and Bob are going to a science fair on Friday. Extract the event details as JSON."},
    ],
    response_format={"type": "json_schema", "json_schema": {"name": "CalendarEvent", "schema": response_schema}},
    stream=True,
)

import json as _json

_accumulated = ""
for chunk in response:
    if chunk.choices and len(chunk.choices) > 0:
        delta = chunk.choices[0].delta
        if delta.content is not None:
            _accumulated += delta.content
            print(delta.content, end="", flush=True)

if not _accumulated:
    raise Exception("VALIDATION FAILED: structured-output stream - no content received")

_parsed = _json.loads(_accumulated)

if "name" not in _parsed or "date" not in _parsed or "participants" not in _parsed:
    raise Exception("VALIDATION FAILED: structured-output stream - missing expected fields (name, date, participants)")

if not isinstance(_parsed.get("participants"), list):
    raise Exception("VALIDATION FAILED: structured-output stream - 'participants' is not a list, schema not enforced")

if set(_parsed.keys()) != {"name", "date", "participants"}:
    raise Exception(
        f"VALIDATION FAILED: structured-output stream - unexpected keys present: {set(_parsed.keys())}"
    )

print("\nVALIDATION: structured-output stream SUCCESS")

google-vertex/deepseek-ai/deepseek-v3-2 — tool-call:stream (failure)

Error:

Traceback (most recent call last):
  File "/tmp/tmpfsaj1o6i/snippet.py", line 27, in <module>
    response = client.chat.completions.create(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_utils/_utils.py", line 286, in wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/resources/chat/completions/completions.py", line 1147, in create
    return self._post(
           ^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 1259, in post
    return cast(ResponseT, self.request(cast_to, opts, stream=stream, stream_cls=stream_cls))
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 1047, in request
    raise self._make_status_error_from_response(err.response) from None
openai.NotFoundError: Error code: 404 - {'status': 'failure', 'message': 'vertex error: Publisher Model `projects/248190060486/locations/global/publishers/deepseek-ai/models/deepseek-v3-2` was not found or your project does not have access to it. Please ensure you are using a valid model version. For more information, see: https://cloud.google.com/vertex-ai/generative-ai/docs/learn/model-versions', 'error': {'message': 'vertex error: Publisher Model `projects/248190060486/locations/global/publishers/deepseek-ai/models/deepseek-v3-2` was not found or your project does not have access to it. Please ensure you are using a valid model version. For more information, see: https://cloud.google.com/vertex-ai/generative-ai/docs/learn/model-versions', 'type': 'APIError', 'code': '404'}, 'error_origin_level': 'api_error', 'provider': 'google-vertex'}

Code snippet

from openai import OpenAI

client = OpenAI(api_key="***", base_url="https://internal.devtest.truefoundry.tech/api/llm")

tools = [
    {
        "type": "function",
        "function": {
            "name": "get_weather",
            "description": "Get the current weather for a location.",
            "parameters": {
                "type": "object",
                "properties": {
                    "location": {
                        "type": "string",
                        "description": "The city name, e.g. London",
                    },
                },
                "required": ["location"],
                "additionalProperties": False,
            },
            "strict": True,
        },
    },
]

response = client.chat.completions.create(
    model="test-v2-vertex/deepseek-ai-deepseek-v3-2",
    messages=[
        {"role": "system", "content": "You are a helpful assistant with access to tools. You MUST strictly use the provided tools to answer. Never respond with plain text when a tool is available."},
        {"role": "user", "content": "Hi"},
        {"role": "assistant", "content": "Hi, how can I help you"},
        {"role": "user", "content": "Use the get_weather tool to check the weather in London. You must call the tool, do not respond with plain text."},
    ],
    tools=tools,
    tool_choice="auto",
    stream=True,
)

_tool_calls_made = False
for chunk in response:
    if chunk.choices and len(chunk.choices) > 0:
        delta = chunk.choices[0].delta
        if delta.content is not None:
            print(delta.content, end="", flush=True)
        if delta.tool_calls:
            _tool_calls_made = True
            for _tc in delta.tool_calls:
                if _tc.function:
                    print(_tc.function.arguments or "", end="", flush=True)

if not _tool_calls_made:
    raise Exception("VALIDATION FAILED: tool-call stream - no tool calls received")
print("\nVALIDATION: tool-call stream SUCCESS")

google-vertex/deepseek-ai/deepseek-v3-2 — params (failure)

Error:

Traceback (most recent call last):
  File "/tmp/tmpa6iprkre/snippet.py", line 5, in <module>
    response = client.chat.completions.create(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_utils/_utils.py", line 286, in wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/resources/chat/completions/completions.py", line 1147, in create
    return self._post(
           ^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 1259, in post
    return cast(ResponseT, self.request(cast_to, opts, stream=stream, stream_cls=stream_cls))
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 1047, in request
    raise self._make_status_error_from_response(err.response) from None
openai.NotFoundError: Error code: 404 - {'status': 'failure', 'message': 'vertex error: Publisher Model `projects/248190060486/locations/global/publishers/deepseek-ai/models/deepseek-v3-2` was not found or your project does not have access to it. Please ensure you are using a valid model version. For more information, see: https://cloud.google.com/vertex-ai/generative-ai/docs/learn/model-versions', 'error': {'message': 'vertex error: Publisher Model `projects/248190060486/locations/global/publishers/deepseek-ai/models/deepseek-v3-2` was not found or your project does not have access to it. Please ensure you are using a valid model version. For more information, see: https://cloud.google.com/vertex-ai/generative-ai/docs/learn/model-versions', 'type': 'APIError', 'code': '404'}, 'error_origin_level': 'api_error', 'provider': 'google-vertex'}

Code snippet

from openai import OpenAI

client = OpenAI(api_key="***", base_url="https://internal.devtest.truefoundry.tech/api/llm")

response = client.chat.completions.create(
    model="test-v2-vertex/deepseek-ai-deepseek-v3-2",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Hi"},
        {"role": "assistant", "content": "Hi, how can I help you"},
        {"role": "user", "content": "What is the capital of France?"},
    ],
    max_tokens=256,
    temperature=0.7,
    stream=False,
)

print(response.choices[0].message.content)

google-vertex/google/gemma4 — tool-call:stream (failure)

Error:

Traceback (most recent call last):
  File "/tmp/tmpykglh178/snippet.py", line 27, in <module>
    response = client.chat.completions.create(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_utils/_utils.py", line 286, in wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/resources/chat/completions/completions.py", line 1147, in create
    return self._post(
           ^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 1259, in post
    return cast(ResponseT, self.request(cast_to, opts, stream=stream, stream_cls=stream_cls))
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 1047, in request
    raise self._make_status_error_from_response(err.response) from None
openai.NotFoundError: Error code: 404 - {'status': 'failure', 'message': 'vertex error: Publisher Model `projects/truefoundry-devtest/locations/global/publishers/google/models/gemma4` was not found or your project does not have access to it. Please ensure you are using a valid model version. For more information, see: https://cloud.google.com/vertex-ai/generative-ai/docs/learn/model-versions', 'error': {'message': 'vertex error: Publisher Model `projects/truefoundry-devtest/locations/global/publishers/google/models/gemma4` was not found or your project does not have access to it. Please ensure you are using a valid model version. For more information, see: https://cloud.google.com/vertex-ai/generative-ai/docs/learn/model-versions', 'type': 'APIError', 'code': '404'}, 'error_origin_level': 'api_error', 'provider': 'google-vertex'}

Code snippet

from openai import OpenAI

client = OpenAI(api_key="***", base_url="https://internal.devtest.truefoundry.tech/api/llm")

tools = [
    {
        "type": "function",
        "function": {
            "name": "get_weather",
            "description": "Get the current weather for a location.",
            "parameters": {
                "type": "object",
                "properties": {
                    "location": {
                        "type": "string",
                        "description": "The city name, e.g. London",
                    },
                },
                "required": ["location"],
                "additionalProperties": False,
            },
            "strict": True,
        },
    },
]

response = client.chat.completions.create(
    model="test-v2-vertex/google-gemma4",
    messages=[
        {"role": "system", "content": "You are a helpful assistant with access to tools. You MUST strictly use the provided tools to answer. Never respond with plain text when a tool is available."},
        {"role": "user", "content": "Hi"},
        {"role": "assistant", "content": "Hi, how can I help you"},
        {"role": "user", "content": "Use the get_weather tool to check the weather in London. You must call the tool, do not respond with plain text."},
    ],
    tools=tools,
    tool_choice="auto",
    stream=True,
)

_tool_calls_made = False
for chunk in response:
    if chunk.choices and len(chunk.choices) > 0:
        delta = chunk.choices[0].delta
        if delta.content is not None:
            print(delta.content, end="", flush=True)
        if delta.tool_calls:
            _tool_calls_made = True
            for _tc in delta.tool_calls:
                if _tc.function:
                    print(_tc.function.arguments or "", end="", flush=True)

if not _tool_calls_made:
    raise Exception("VALIDATION FAILED: tool-call stream - no tool calls received")
print("\nVALIDATION: tool-call stream SUCCESS")

google-vertex/google/gemma4 — params (failure)

Error:

Traceback (most recent call last):
  File "/tmp/tmp1ox5efj6/snippet.py", line 5, in <module>
    response = client.chat.completions.create(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_utils/_utils.py", line 286, in wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/resources/chat/completions/completions.py", line 1147, in create
    return self._post(
           ^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 1259, in post
    return cast(ResponseT, self.request(cast_to, opts, stream=stream, stream_cls=stream_cls))
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 1047, in request
    raise self._make_status_error_from_response(err.response) from None
openai.NotFoundError: Error code: 404 - {'status': 'failure', 'message': 'vertex error: Publisher Model `projects/truefoundry-devtest/locations/global/publishers/google/models/gemma4` was not found or your project does not have access to it. Please ensure you are using a valid model version. For more information, see: https://cloud.google.com/vertex-ai/generative-ai/docs/learn/model-versions', 'error': {'message': 'vertex error: Publisher Model `projects/truefoundry-devtest/locations/global/publishers/google/models/gemma4` was not found or your project does not have access to it. Please ensure you are using a valid model version. For more information, see: https://cloud.google.com/vertex-ai/generative-ai/docs/learn/model-versions', 'type': 'APIError', 'code': '404'}, 'error_origin_level': 'api_error', 'provider': 'google-vertex'}

Code snippet

from openai import OpenAI

client = OpenAI(api_key="***", base_url="https://internal.devtest.truefoundry.tech/api/llm")

response = client.chat.completions.create(
    model="test-v2-vertex/google-gemma4",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Hi"},
        {"role": "assistant", "content": "Hi, how can I help you"},
        {"role": "user", "content": "What is the capital of France?"},
    ],
    max_tokens=256,
    temperature=0.7,
    stream=False,
)

print(response.choices[0].message.content)

google-vertex/google/gemma4 — structured-output:stream (failure)

Error:

Traceback (most recent call last):
  File "/tmp/tmp_vuwb2ne/snippet.py", line 21, in <module>
    response = client.chat.completions.create(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_utils/_utils.py", line 286, in wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/resources/chat/completions/completions.py", line 1147, in create
    return self._post(
           ^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 1259, in post
    return cast(ResponseT, self.request(cast_to, opts, stream=stream, stream_cls=stream_cls))
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 1047, in request
    raise self._make_status_error_from_response(err.response) from None
openai.NotFoundError: Error code: 404 - {'status': 'failure', 'message': 'vertex error: Publisher Model `projects/truefoundry-devtest/locations/global/publishers/google/models/gemma4` was not found or your project does not have access to it. Please ensure you are using a valid model version. For more information, see: https://cloud.google.com/vertex-ai/generative-ai/docs/learn/model-versions', 'error': {'message': 'vertex error: Publisher Model `projects/truefoundry-devtest/locations/global/publishers/google/models/gemma4` was not found or your project does not have access to it. Please ensure you are using a valid model version. For more information, see: https://cloud.google.com/vertex-ai/generative-ai/docs/learn/model-versions', 'type': 'APIError', 'code': '404'}, 'error_origin_level': 'api_error', 'provider': 'google-vertex'}

Code snippet

from openai import OpenAI
import json

client = OpenAI(api_key="***", base_url="https://internal.devtest.truefoundry.tech/api/llm")

response_schema = json.loads('''{
  "title": "CalendarEvent",
  "type": "object",
  "properties": {
    "name": { "type": "string" },
    "date": { "type": "string" },
    "participants": {
      "type": "array",
      "items": { "type": "string" }
    }
  },
  "required": ["name", "date", "participants"],
  "additionalProperties": false
}''')

response = client.chat.completions.create(
    model="test-v2-vertex/google-gemma4",
    messages=[
        {"role": "system", "content": "Extract the event information as JSON."},
        {"role": "user", "content": "Hi"},
        {"role": "assistant", "content": "Hi, how can I help you"},
        {"role": "user", "content": "Alice and Bob are going to a science fair on Friday. Extract the event details as JSON."},
    ],
    response_format={"type": "json_schema", "json_schema": {"name": "CalendarEvent", "schema": response_schema}},
    stream=True,
)

import json as _json

_accumulated = ""
for chunk in response:
    if chunk.choices and len(chunk.choices) > 0:
        delta = chunk.choices[0].delta
        if delta.content is not None:
            _accumulated += delta.content
            print(delta.content, end="", flush=True)

if not _accumulated:
    raise Exception("VALIDATION FAILED: structured-output stream - no content received")

_parsed = _json.loads(_accumulated)

if "name" not in _parsed or "date" not in _parsed or "participants" not in _parsed:
    raise Exception("VALIDATION FAILED: structured-output stream - missing expected fields (name, date, participants)")

if not isinstance(_parsed.get("participants"), list):
    raise Exception("VALIDATION FAILED: structured-output stream - 'participants' is not a list, schema not enforced")

if set(_parsed.keys()) != {"name", "date", "participants"}:
    raise Exception(
        f"VALIDATION FAILED: structured-output stream - unexpected keys present: {set(_parsed.keys())}"
    )

print("\nVALIDATION: structured-output stream SUCCESS")

google-vertex/google/gemma4 — params:stream (failure)

Error:

Traceback (most recent call last):
  File "/tmp/tmp6dhh7zkb/snippet.py", line 5, in <module>
    response = client.chat.completions.create(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_utils/_utils.py", line 286, in wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/resources/chat/completions/completions.py", line 1147, in create
    return self._post(
           ^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 1259, in post
    return cast(ResponseT, self.request(cast_to, opts, stream=stream, stream_cls=stream_cls))
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 1047, in request
    raise self._make_status_error_from_response(err.response) from None
openai.NotFoundError: Error code: 404 - {'status': 'failure', 'message': 'vertex error: Publisher Model `projects/truefoundry-devtest/locations/global/publishers/google/models/gemma4` was not found or your project does not have access to it. Please ensure you are using a valid model version. For more information, see: https://cloud.google.com/vertex-ai/generative-ai/docs/learn/model-versions', 'error': {'message': 'vertex error: Publisher Model `projects/truefoundry-devtest/locations/global/publishers/google/models/gemma4` was not found or your project does not have access to it. Please ensure you are using a valid model version. For more information, see: https://cloud.google.com/vertex-ai/generative-ai/docs/learn/model-versions', 'type': 'APIError', 'code': '404'}, 'error_origin_level': 'api_error', 'provider': 'google-vertex'}

Code snippet

from openai import OpenAI

client = OpenAI(api_key="***", base_url="https://internal.devtest.truefoundry.tech/api/llm")

response = client.chat.completions.create(
    model="test-v2-vertex/google-gemma4",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Hi"},
        {"role": "assistant", "content": "Hi, how can I help you"},
        {"role": "user", "content": "What is the capital of France?"},
    ],
    max_tokens=256,
    temperature=0.7,
    stream=True,
)

for chunk in response:
    if chunk.choices and len(chunk.choices) > 0:
        delta = chunk.choices[0].delta
        if delta.content is not None:
            print(delta.content, end="", flush=True)

google-vertex/google/gemma4 — reasoning (failure)

Error:

Traceback (most recent call last):
  File "/tmp/tmpsmb_ygok/snippet.py", line 5, in <module>
    response = client.chat.completions.create(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_utils/_utils.py", line 286, in wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/resources/chat/completions/completions.py", line 1147, in create
    return self._post(
           ^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 1259, in post
    return cast(ResponseT, self.request(cast_to, opts, stream=stream, stream_cls=stream_cls))
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 1047, in request
    raise self._make_status_error_from_response(err.response) from None
openai.NotFoundError: Error code: 404 - {'status': 'failure', 'message': 'vertex error: Publisher Model `projects/truefoundry-devtest/locations/global/publishers/google/models/gemma4` was not found or your project does not have access to it. Please ensure you are using a valid model version. For more information, see: https://cloud.google.com/vertex-ai/generative-ai/docs/learn/model-versions', 'error': {'message': 'vertex error: Publisher Model `projects/truefoundry-devtest/locations/global/publishers/google/models/gemma4` was not found or your project does not have access to it. Please ensure you are using a valid model version. For more information, see: https://cloud.google.com/vertex-ai/generative-ai/docs/learn/model-versions', 'type': 'APIError', 'code': '404'}, 'error_origin_level': 'api_error', 'provider': 'google-vertex'}

Code snippet

from openai import OpenAI

client = OpenAI(api_key="***", base_url="https://internal.devtest.truefoundry.tech/api/llm")

response = client.chat.completions.create(
    model="test-v2-vertex/google-gemma4",
    messages=[
        {"role": "system", "content": "You are a helpful assistant. You MUST think step by step and show your reasoning. Never skip reasoning steps."},
        {"role": "user", "content": "Hi"},
        {"role": "assistant", "content": "Hi, how can I help you"},
        {"role": "user", "content": "How to calculate 3^3^3^3? Think step by step and show all reasoning."},
    ],
    reasoning_effort="medium",
    stream=False,
)

_usage = getattr(response, "usage", None)
_reasoning_detected = False

_choices = getattr(response, "choices", None)
if _choices and len(_choices) > 0:
    _message = getattr(_choices[0], "message", None)
else:
    _message = None

if _message and getattr(_message, "content", None) is not None:
    print(_message.content)

if _usage is not None:
    _output_token_details = getattr(_usage, "completion_tokens_details", None)
    if _output_token_details and getattr(_output_token_details, "reasoning_tokens", 0) > 0:
        _reasoning_detected = True
    elif getattr(_usage, "reasoning", None) is not None:
        _reasoning_detected = True

if getattr(_message, "reasoning_content", None) is not None:
    _reasoning_detected = True
elif getattr(_message, "reasoning", None) is not None:
    _reasoning_detected = True

if not _reasoning_detected:
    print("Response: ", response)
    raise Exception("VALIDATION FAILED: reasoning - no reasoning information in response")
print("VALIDATION: reasoning SUCCESS")

google-vertex/google/gemma4 — reasoning:stream (failure)

Error:

Traceback (most recent call last):
  File "/tmp/tmpr80xnwmq/snippet.py", line 5, in <module>
    response = client.chat.completions.create(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_utils/_utils.py", line 286, in wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/resources/chat/completions/completions.py", line 1147, in create
    return self._post(
           ^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 1259, in post
    return cast(ResponseT, self.request(cast_to, opts, stream=stream, stream_cls=stream_cls))
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 1047, in request
    raise self._make_status_error_from_response(err.response) from None
openai.NotFoundError: Error code: 404 - {'status': 'failure', 'message': 'vertex error: Publisher Model `projects/truefoundry-devtest/locations/global/publishers/google/models/gemma4` was not found or your project does not have access to it. Please ensure you are using a valid model version. For more information, see: https://cloud.google.com/vertex-ai/generative-ai/docs/learn/model-versions', 'error': {'message': 'vertex error: Publisher Model `projects/truefoundry-devtest/locations/global/publishers/google/models/gemma4` was not found or your project does not have access to it. Please ensure you are using a valid model version. For more information, see: https://cloud.google.com/vertex-ai/generative-ai/docs/learn/model-versions', 'type': 'APIError', 'code': '404'}, 'error_origin_level': 'api_error', 'provider': 'google-vertex'}

Code snippet

from openai import OpenAI

client = OpenAI(api_key="***", base_url="https://internal.devtest.truefoundry.tech/api/llm")

response = client.chat.completions.create(
    model="test-v2-vertex/google-gemma4",
    messages=[
        {"role": "system", "content": "You are a helpful assistant. You MUST think step by step and show your reasoning. Never skip reasoning steps."},
        {"role": "user", "content": "Hi"},
        {"role": "assistant", "content": "Hi, how can I help you"},
        {"role": "user", "content": "How to calculate 3^3^3^3? Think step by step and show all reasoning."},
    ],
    reasoning_effort="medium",
    stream=True,
)

_reasoning_detected = False
for chunk in response:
    if chunk.choices and len(chunk.choices) > 0:
        delta = chunk.choices[0].delta
        if delta.content is not None:
            print(delta.content, end="", flush=True)
        if getattr(delta, "reasoning_content", None) is not None:
            _reasoning_detected = True
        if getattr(delta, "reasoning", None) is not None:
            _reasoning_detected = True

    _usage = getattr(chunk, "usage", None)
    if _usage is not None:
        _details = getattr(_usage, "completion_tokens_details", None)
        if _details and getattr(_details, "reasoning_tokens", 0) > 0:
            _reasoning_detected = True

if not _reasoning_detected:
    raise Exception("VALIDATION FAILED: reasoning stream - no reasoning information in stream")
print("\nVALIDATION: reasoning stream SUCCESS")

google-vertex/google/gemma4 — structured-output (failure)

Error:

Traceback (most recent call last):
  File "/tmp/tmpq_pdn5_l/snippet.py", line 21, in <module>
    response = client.chat.completions.create(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_utils/_utils.py", line 286, in wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/resources/chat/completions/completions.py", line 1147, in create
    return self._post(
           ^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 1259, in post
    return cast(ResponseT, self.request(cast_to, opts, stream=stream, stream_cls=stream_cls))
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 1047, in request
    raise self._make_status_error_from_response(err.response) from None
openai.NotFoundError: Error code: 404 - {'status': 'failure', 'message': 'vertex error: Publisher Model `projects/truefoundry-devtest/locations/global/publishers/google/models/gemma4` was not found or your project does not have access to it. Please ensure you are using a valid model version. For more information, see: https://cloud.google.com/vertex-ai/generative-ai/docs/learn/model-versions', 'error': {'message': 'vertex error: Publisher Model `projects/truefoundry-devtest/locations/global/publishers/google/models/gemma4` was not found or your project does not have access to it. Please ensure you are using a valid model version. For more information, see: https://cloud.google.com/vertex-ai/generative-ai/docs/learn/model-versions', 'type': 'APIError', 'code': '404'}, 'error_origin_level': 'api_error', 'provider': 'google-vertex'}

Code snippet

from openai import OpenAI
import json

client = OpenAI(api_key="***", base_url="https://internal.devtest.truefoundry.tech/api/llm")

response_schema = json.loads('''{
  "title": "CalendarEvent",
  "type": "object",
  "properties": {
    "name": { "type": "string" },
    "date": { "type": "string" },
    "participants": {
      "type": "array",
      "items": { "type": "string" }
    }
  },
  "required": ["name", "date", "participants"],
  "additionalProperties": false
}''')

response = client.chat.completions.create(
    model="test-v2-vertex/google-gemma4",
    messages=[
        {"role": "system", "content": "Extract the event information as JSON."},
        {"role": "user", "content": "Hi"},
        {"role": "assistant", "content": "Hi, how can I help you"},
        {"role": "user", "content": "Alice and Bob are going to a science fair on Friday. Extract the event details as JSON."},
    ],
    response_format={"type": "json_schema", "json_schema": {"name": "CalendarEvent", "schema": response_schema}},
    stream=False,
)

import json as _json

_content = response.choices[0].message.content
print(_content)

if not _content:
    raise Exception("VALIDATION FAILED: structured-output - response content is empty")

_parsed = _json.loads(_content)

if "name" not in _parsed or "date" not in _parsed or "participants" not in _parsed:
    raise Exception("VALIDATION FAILED: structured-output - missing expected fields (name, date, participants)")

if not isinstance(_parsed.get("participants"), list):
    raise Exception("VALIDATION FAILED: structured-output - 'participants' is not a list, schema not enforced")

if set(_parsed.keys()) != {"name", "date", "participants"}:
    raise Exception(
        f"VALIDATION FAILED: structured-output - unexpected keys present: {set(_parsed.keys())}"
    )

print("VALIDATION: structured-output SUCCESS")

google-vertex/google/gemma4 — tool-call (failure)

Error:

Traceback (most recent call last):
  File "/tmp/tmpiyi6n9na/snippet.py", line 27, in <module>
    response = client.chat.completions.create(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_utils/_utils.py", line 286, in wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/resources/chat/completions/completions.py", line 1147, in create
    return self._post(
           ^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 1259, in post
    return cast(ResponseT, self.request(cast_to, opts, stream=stream, stream_cls=stream_cls))
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 1047, in request
    raise self._make_status_error_from_response(err.response) from None
openai.NotFoundError: Error code: 404 - {'status': 'failure', 'message': 'vertex error: Publisher Model `projects/truefoundry-devtest/locations/global/publishers/google/models/gemma4` was not found or your project does not have access to it. Please ensure you are using a valid model version. For more information, see: https://cloud.google.com/vertex-ai/generative-ai/docs/learn/model-versions', 'error': {'message': 'vertex error: Publisher Model `projects/truefoundry-devtest/locations/global/publishers/google/models/gemma4` was not found or your project does not have access to it. Please ensure you are using a valid model version. For more information, see: https://cloud.google.com/vertex-ai/generative-ai/docs/learn/model-versions', 'type': 'APIError', 'code': '404'}, 'error_origin_level': 'api_error', 'provider': 'google-vertex'}

Code snippet

from openai import OpenAI

client = OpenAI(api_key="***", base_url="https://internal.devtest.truefoundry.tech/api/llm")

tools = [
    {
        "type": "function",
        "function": {
            "name": "get_weather",
            "description": "Get the current weather for a location.",
            "parameters": {
                "type": "object",
                "properties": {
                    "location": {
                        "type": "string",
                        "description": "The city name, e.g. London",
                    },
                },
                "required": ["location"],
                "additionalProperties": False,
            },
            "strict": True,
        },
    },
]

response = client.chat.completions.create(
    model="test-v2-vertex/google-gemma4",
    messages=[
        {"role": "system", "content": "You are a helpful assistant with access to tools. You MUST strictly use the provided tools to answer. Never respond with plain text when a tool is available."},
        {"role": "user", "content": "Hi"},
        {"role": "assistant", "content": "Hi, how can I help you"},
        {"role": "user", "content": "Use the get_weather tool to check the weather in London. You must call the tool, do not respond with plain text."},
    ],
    tools=tools,
    tool_choice="auto",
    stream=False,
)

_message = response.choices[0].message
if _message.tool_calls:
    for _tc in _message.tool_calls:
        print(f"Function: {_tc.function.name}")
        print(f"Arguments: {_tc.function.arguments}")
else:
    print(_message.content)

if not _message.tool_calls or len(_message.tool_calls) == 0:
    raise Exception("VALIDATION FAILED: tool-call - no tool calls in response")
print("VALIDATION: tool-call SUCCESS")

google-vertex/minimaxai/minimax-m2 — structured-output:stream (failure)

Error:

Traceback (most recent call last):
  File "/tmp/tmpgp7f_nrg/snippet.py", line 21, in <module>
    response = client.chat.completions.create(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_utils/_utils.py", line 286, in wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/resources/chat/completions/completions.py", line 1147, in create
    return self._post(
           ^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 1259, in post
    return cast(ResponseT, self.request(cast_to, opts, stream=stream, stream_cls=stream_cls))
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 1047, in request
    raise self._make_status_error_from_response(err.response) from None
openai.NotFoundError: Error code: 404 - {'status': 'failure', 'message': 'vertex error: Publisher Model `projects/248190060486/locations/global/publishers/minimaxai/models/minimax-m2` was not found or your project does not have access to it. Please ensure you are using a valid model version. For more information, see: https://cloud.google.com/vertex-ai/generative-ai/docs/learn/model-versions', 'error': {'message': 'vertex error: Publisher Model `projects/248190060486/locations/global/publishers/minimaxai/models/minimax-m2` was not found or your project does not have access to it. Please ensure you are using a valid model version. For more information, see: https://cloud.google.com/vertex-ai/generative-ai/docs/learn/model-versions', 'type': 'APIError', 'code': '404'}, 'error_origin_level': 'api_error', 'provider': 'google-vertex'}

Code snippet

from openai import OpenAI
import json

client = OpenAI(api_key="***", base_url="https://internal.devtest.truefoundry.tech/api/llm")

response_schema = json.loads('''{
  "title": "CalendarEvent",
  "type": "object",
  "properties": {
    "name": { "type": "string" },
    "date": { "type": "string" },
    "participants": {
      "type": "array",
      "items": { "type": "string" }
    }
  },
  "required": ["name", "date", "participants"],
  "additionalProperties": false
}''')

response = client.chat.completions.create(
    model="test-v2-vertex/minimaxai-minimax-m2",
    messages=[
        {"role": "system", "content": "Extract the event information as JSON."},
        {"role": "user", "content": "Hi"},
        {"role": "assistant", "content": "Hi, how can I help you"},
        {"role": "user", "content": "Alice and Bob are going to a science fair on Friday. Extract the event details as JSON."},
    ],
    response_format={"type": "json_schema", "json_schema": {"name": "CalendarEvent", "schema": response_schema}},
    stream=True,
)

import json as _json

_accumulated = ""
for chunk in response:
    if chunk.choices and len(chunk.choices) > 0:
        delta = chunk.choices[0].delta
        if delta.content is not None:
            _accumulated += delta.content
            print(delta.content, end="", flush=True)

if not _accumulated:
    raise Exception("VALIDATION FAILED: structured-output stream - no content received")

_parsed = _json.loads(_accumulated)

if "name" not in _parsed or "date" not in _parsed or "participants" not in _parsed:
    raise Exception("VALIDATION FAILED: structured-output stream - missing expected fields (name, date, participants)")

if not isinstance(_parsed.get("participants"), list):
    raise Exception("VALIDATION FAILED: structured-output stream - 'participants' is not a list, schema not enforced")

if set(_parsed.keys()) != {"name", "date", "participants"}:
    raise Exception(
        f"VALIDATION FAILED: structured-output stream - unexpected keys present: {set(_parsed.keys())}"
    )

print("\nVALIDATION: structured-output stream SUCCESS")

google-vertex/minimaxai/minimax-m2 — tool-call (failure)

Error:

Traceback (most recent call last):
  File "/tmp/tmpzfjp_603/snippet.py", line 27, in <module>
    response = client.chat.completions.create(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_utils/_utils.py", line 286, in wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/resources/chat/completions/completions.py", line 1147, in create
    return self._post(
           ^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 1259, in post
    return cast(ResponseT, self.request(cast_to, opts, stream=stream, stream_cls=stream_cls))
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 1047, in request
    raise self._make_status_error_from_response(err.response) from None
openai.NotFoundError: Error code: 404 - {'status': 'failure', 'message': 'vertex error: Publisher Model `projects/248190060486/locations/global/publishers/minimaxai/models/minimax-m2` was not found or your project does not have access to it. Please ensure you are using a valid model version. For more information, see: https://cloud.google.com/vertex-ai/generative-ai/docs/learn/model-versions', 'error': {'message': 'vertex error: Publisher Model `projects/248190060486/locations/global/publishers/minimaxai/models/minimax-m2` was not found or your project does not have access to it. Please ensure you are using a valid model version. For more information, see: https://cloud.google.com/vertex-ai/generative-ai/docs/learn/model-versions', 'type': 'APIError', 'code': '404'}, 'error_origin_level': 'api_error', 'provider': 'google-vertex'}

Code snippet

from openai import OpenAI

client = OpenAI(api_key="***", base_url="https://internal.devtest.truefoundry.tech/api/llm")

tools = [
    {
        "type": "function",
        "function": {
            "name": "get_weather",
            "description": "Get the current weather for a location.",
            "parameters": {
                "type": "object",
                "properties": {
                    "location": {
                        "type": "string",
                        "description": "The city name, e.g. London",
                    },
                },
                "required": ["location"],
                "additionalProperties": False,
            },
            "strict": True,
        },
    },
]

response = client.chat.completions.create(
    model="test-v2-vertex/minimaxai-minimax-m2",
    messages=[
        {"role": "system", "content": "You are a helpful assistant with access to tools. You MUST strictly use the provided tools to answer. Never respond with plain text when a tool is available."},
        {"role": "user", "content": "Hi"},
        {"role": "assistant", "content": "Hi, how can I help you"},
        {"role": "user", "content": "Use the get_weather tool to check the weather in London. You must call the tool, do not respond with plain text."},
    ],
    tools=tools,
    tool_choice="auto",
    stream=False,
)

_message = response.choices[0].message
if _message.tool_calls:
    for _tc in _message.tool_calls:
        print(f"Function: {_tc.function.name}")
        print(f"Arguments: {_tc.function.arguments}")
else:
    print(_message.content)

if not _message.tool_calls or len(_message.tool_calls) == 0:
    raise Exception("VALIDATION FAILED: tool-call - no tool calls in response")
print("VALIDATION: tool-call SUCCESS")

google-vertex/minimaxai/minimax-m2 — params (failure)

Error:

Traceback (most recent call last):
  File "/tmp/tmpswgz_y9z/snippet.py", line 5, in <module>
    response = client.chat.completions.create(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_utils/_utils.py", line 286, in wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/resources/chat/completions/completions.py", line 1147, in create
    return self._post(
           ^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 1259, in post
    return cast(ResponseT, self.request(cast_to, opts, stream=stream, stream_cls=stream_cls))
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 1047, in request
    raise self._make_status_error_from_response(err.response) from None
openai.NotFoundError: Error code: 404 - {'status': 'failure', 'message': 'vertex error: Publisher Model `projects/248190060486/locations/global/publishers/minimaxai/models/minimax-m2` was not found or your project does not have access to it. Please ensure you are using a valid model version. For more information, see: https://cloud.google.com/vertex-ai/generative-ai/docs/learn/model-versions', 'error': {'message': 'vertex error: Publisher Model `projects/248190060486/locations/global/publishers/minimaxai/models/minimax-m2` was not found or your project does not have access to it. Please ensure you are using a valid model version. For more information, see: https://cloud.google.com/vertex-ai/generative-ai/docs/learn/model-versions', 'type': 'APIError', 'code': '404'}, 'error_origin_level': 'api_error', 'provider': 'google-vertex'}

Code snippet

from openai import OpenAI

client = OpenAI(api_key="***", base_url="https://internal.devtest.truefoundry.tech/api/llm")

response = client.chat.completions.create(
    model="test-v2-vertex/minimaxai-minimax-m2",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Hi"},
        {"role": "assistant", "content": "Hi, how can I help you"},
        {"role": "user", "content": "What is the capital of France?"},
    ],
    max_tokens=256,
    temperature=0.7,
    stream=False,
)

print(response.choices[0].message.content)

google-vertex/minimaxai/minimax-m2 — structured-output (failure)

Error:

Traceback (most recent call last):
  File "/tmp/tmp767ps4p8/snippet.py", line 21, in <module>
    response = client.chat.completions.create(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_utils/_utils.py", line 286, in wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/resources/chat/completions/completions.py", line 1147, in create
    return self._post(
           ^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 1259, in post
    return cast(ResponseT, self.request(cast_to, opts, stream=stream, stream_cls=stream_cls))
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 1047, in request
    raise self._make_status_error_from_response(err.response) from None
openai.NotFoundError: Error code: 404 - {'status': 'failure', 'message': 'vertex error: Publisher Model `projects/248190060486/locations/global/publishers/minimaxai/models/minimax-m2` was not found or your project does not have access to it. Please ensure you are using a valid model version. For more information, see: https://cloud.google.com/vertex-ai/generative-ai/docs/learn/model-versions', 'error': {'message': 'vertex error: Publisher Model `projects/248190060486/locations/global/publishers/minimaxai/models/minimax-m2` was not found or your project does not have access to it. Please ensure you are using a valid model version. For more information, see: https://cloud.google.com/vertex-ai/generative-ai/docs/learn/model-versions', 'type': 'APIError', 'code': '404'}, 'error_origin_level': 'api_error', 'provider': 'google-vertex'}

Code snippet

from openai import OpenAI
import json

client = OpenAI(api_key="***", base_url="https://internal.devtest.truefoundry.tech/api/llm")

response_schema = json.loads('''{
  "title": "CalendarEvent",
  "type": "object",
  "properties": {
    "name": { "type": "string" },
    "date": { "type": "string" },
    "participants": {
      "type": "array",
      "items": { "type": "string" }
    }
  },
  "required": ["name", "date", "participants"],
  "additionalProperties": false
}''')

response = client.chat.completions.create(
    model="test-v2-vertex/minimaxai-minimax-m2",
    messages=[
        {"role": "system", "content": "Extract the event information as JSON."},
        {"role": "user", "content": "Hi"},
        {"role": "assistant", "content": "Hi, how can I help you"},
        {"role": "user", "content": "Alice and Bob are going to a science fair on Friday. Extract the event details as JSON."},
    ],
    response_format={"type": "json_schema", "json_schema": {"name": "CalendarEvent", "schema": response_schema}},
    stream=False,
)

import json as _json

_content = response.choices[0].message.content
print(_content)

if not _content:
    raise Exception("VALIDATION FAILED: structured-output - response content is empty")

_parsed = _json.loads(_content)

if "name" not in _parsed or "date" not in _parsed or "participants" not in _parsed:
    raise Exception("VALIDATION FAILED: structured-output - missing expected fields (name, date, participants)")

if not isinstance(_parsed.get("participants"), list):
    raise Exception("VALIDATION FAILED: structured-output - 'participants' is not a list, schema not enforced")

if set(_parsed.keys()) != {"name", "date", "participants"}:
    raise Exception(
        f"VALIDATION FAILED: structured-output - unexpected keys present: {set(_parsed.keys())}"
    )

print("VALIDATION: structured-output SUCCESS")

google-vertex/minimaxai/minimax-m2 — tool-call:stream (failure)

Error:

Traceback (most recent call last):
  File "/tmp/tmp1oilmadg/snippet.py", line 27, in <module>
    response = client.chat.completions.create(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_utils/_utils.py", line 286, in wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/resources/chat/completions/completions.py", line 1147, in create
    return self._post(
           ^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 1259, in post
    return cast(ResponseT, self.request(cast_to, opts, stream=stream, stream_cls=stream_cls))
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 1047, in request
    raise self._make_status_error_from_response(err.response) from None
openai.NotFoundError: Error code: 404 - {'status': 'failure', 'message': 'vertex error: Publisher Model `projects/248190060486/locations/global/publishers/minimaxai/models/minimax-m2` was not found or your project does not have access to it. Please ensure you are using a valid model version. For more information, see: https://cloud.google.com/vertex-ai/generative-ai/docs/learn/model-versions', 'error': {'message': 'vertex error: Publisher Model `projects/248190060486/locations/global/publishers/minimaxai/models/minimax-m2` was not found or your project does not have access to it. Please ensure you are using a valid model version. For more information, see: https://cloud.google.com/vertex-ai/generative-ai/docs/learn/model-versions', 'type': 'APIError', 'code': '404'}, 'error_origin_level': 'api_error', 'provider': 'google-vertex'}

Code snippet

from openai import OpenAI

client = OpenAI(api_key="***", base_url="https://internal.devtest.truefoundry.tech/api/llm")

tools = [
    {
        "type": "function",
        "function": {
            "name": "get_weather",
            "description": "Get the current weather for a location.",
            "parameters": {
                "type": "object",
                "properties": {
                    "location": {
                        "type": "string",
                        "description": "The city name, e.g. London",
                    },
                },
                "required": ["location"],
                "additionalProperties": False,
            },
            "strict": True,
        },
    },
]

response = client.chat.completions.create(
    model="test-v2-vertex/minimaxai-minimax-m2",
    messages=[
        {"role": "system", "content": "You are a helpful assistant with access to tools. You MUST strictly use the provided tools to answer. Never respond with plain text when a tool is available."},
        {"role": "user", "content": "Hi"},
        {"role": "assistant", "content": "Hi, how can I help you"},
        {"role": "user", "content": "Use the get_weather tool to check the weather in London. You must call the tool, do not respond with plain text."},
    ],
    tools=tools,
    tool_choice="auto",
    stream=True,
)

_tool_calls_made = False
for chunk in response:
    if chunk.choices and len(chunk.choices) > 0:
        delta = chunk.choices[0].delta
        if delta.content is not None:
            print(delta.content, end="", flush=True)
        if delta.tool_calls:
            _tool_calls_made = True
            for _tc in delta.tool_calls:
                if _tc.function:
                    print(_tc.function.arguments or "", end="", flush=True)

if not _tool_calls_made:
    raise Exception("VALIDATION FAILED: tool-call stream - no tool calls received")
print("\nVALIDATION: tool-call stream SUCCESS")

google-vertex/minimaxai/minimax-m2 — params:stream (failure)

Error:

Traceback (most recent call last):
  File "/tmp/tmpa13xw1so/snippet.py", line 5, in <module>
    response = client.chat.completions.create(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_utils/_utils.py", line 286, in wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/resources/chat/completions/completions.py", line 1147, in create
    return self._post(
           ^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 1259, in post
    return cast(ResponseT, self.request(cast_to, opts, stream=stream, stream_cls=stream_cls))
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 1047, in request
    raise self._make_status_error_from_response(err.response) from None
openai.NotFoundError: Error code: 404 - {'status': 'failure', 'message': 'vertex error: Publisher Model `projects/248190060486/locations/global/publishers/minimaxai/models/minimax-m2` was not found or your project does not have access to it. Please ensure you are using a valid model version. For more information, see: https://cloud.google.com/vertex-ai/generative-ai/docs/learn/model-versions', 'error': {'message': 'vertex error: Publisher Model `projects/248190060486/locations/global/publishers/minimaxai/models/minimax-m2` was not found or your project does not have access to it. Please ensure you are using a valid model version. For more information, see: https://cloud.google.com/vertex-ai/generative-ai/docs/learn/model-versions', 'type': 'APIError', 'code': '404'}, 'error_origin_level': 'api_error', 'provider': 'google-vertex'}

Code snippet

from openai import OpenAI

client = OpenAI(api_key="***", base_url="https://internal.devtest.truefoundry.tech/api/llm")

response = client.chat.completions.create(
    model="test-v2-vertex/minimaxai-minimax-m2",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Hi"},
        {"role": "assistant", "content": "Hi, how can I help you"},
        {"role": "user", "content": "What is the capital of France?"},
    ],
    max_tokens=256,
    temperature=0.7,
    stream=True,
)

for chunk in response:
    if chunk.choices and len(chunk.choices) > 0:
        delta = chunk.choices[0].delta
        if delta.content is not None:
            print(delta.content, end="", flush=True)

google-vertex/gemini-embedding-2-preview — params (failure)

Error:

Traceback (most recent call last):
  File "/tmp/tmpxlnvr1bu/snippet.py", line 5, in <module>
    response = client.embeddings.create(
               ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/resources/embeddings.py", line 132, in create
    return self._post(
           ^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 1259, in post
    return cast(ResponseT, self.request(cast_to, opts, stream=stream, stream_cls=stream_cls))
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 1047, in request
    raise self._make_status_error_from_response(err.response) from None
openai.NotFoundError: Error code: 404 - {'status': 'failure', 'message': 'vertex error: Publisher Model `projects/truefoundry-devtest/locations/global/publishers/google/models/gemini-embedding-2-preview` was not found or your project does not have access to it. Please ensure you are using a valid model version. For more information, see: https://cloud.google.com/vertex-ai/generative-ai/docs/learn/model-versions', 'error': {'message': 'vertex error: Publisher Model `projects/truefoundry-devtest/locations/global/publishers/google/models/gemini-embedding-2-preview` was not found or your project does not have access to it. Please ensure you are using a valid model version. For more information, see: https://cloud.google.com/vertex-ai/generative-ai/docs/learn/model-versions', 'type': 'APIError', 'code': '404'}, 'error_origin_level': 'api_error', 'provider': 'google-vertex'}

Code snippet

from openai import OpenAI

client = OpenAI(api_key="***", base_url="https://internal.devtest.truefoundry.tech/api/llm")

response = client.embeddings.create(
    model="test-v2-vertex/gemini-embedding-2-preview",
    input="What is the capital of France?",
    encoding_format="float",
)

output = [embed.embedding for embed in response.data]
print(output)

google-vertex/google/gemini-embedding-2-preview — params (failure)

Error:

Traceback (most recent call last):
  File "/tmp/tmpz0go4_ri/snippet.py", line 5, in <module>
    response = client.embeddings.create(
               ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/resources/embeddings.py", line 132, in create
    return self._post(
           ^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 1259, in post
    return cast(ResponseT, self.request(cast_to, opts, stream=stream, stream_cls=stream_cls))
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 1047, in request
    raise self._make_status_error_from_response(err.response) from None
openai.NotFoundError: Error code: 404 - {'status': 'failure', 'message': 'vertex error: Publisher Model `projects/truefoundry-devtest/locations/global/publishers/google/models/gemini-embedding-2-preview` was not found or your project does not have access to it. Please ensure you are using a valid model version. For more information, see: https://cloud.google.com/vertex-ai/generative-ai/docs/learn/model-versions', 'error': {'message': 'vertex error: Publisher Model `projects/truefoundry-devtest/locations/global/publishers/google/models/gemini-embedding-2-preview` was not found or your project does not have access to it. Please ensure you are using a valid model version. For more information, see: https://cloud.google.com/vertex-ai/generative-ai/docs/learn/model-versions', 'type': 'APIError', 'code': '404'}, 'error_origin_level': 'api_error', 'provider': 'google-vertex'}

Code snippet

from openai import OpenAI

client = OpenAI(api_key="***", base_url="https://internal.devtest.truefoundry.tech/api/llm")

response = client.embeddings.create(
    model="test-v2-vertex/google-gemini-embedding-2-preview",
    input="What is the capital of France?",
    encoding_format="float",
)

output = [embed.embedding for embed in response.data]
print(output)

google-vertex/google/translate-llm — params:stream (failure)

Error:

Traceback (most recent call last):
  File "/tmp/tmpkn006puy/snippet.py", line 5, in <module>
    response = client.chat.completions.create(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_utils/_utils.py", line 286, in wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/resources/chat/completions/completions.py", line 1147, in create
    return self._post(
           ^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 1259, in post
    return cast(ResponseT, self.request(cast_to, opts, stream=stream, stream_cls=stream_cls))
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 1047, in request
    raise self._make_status_error_from_response(err.response) from None
openai.NotFoundError: Error code: 404 - {'status': 'failure', 'message': 'vertex error: Publisher Model `projects/truefoundry-devtest/locations/global/publishers/google/models/translate-llm` was not found or your project does not have access to it. Please ensure you are using a valid model version. For more information, see: https://cloud.google.com/vertex-ai/generative-ai/docs/learn/model-versions', 'error': {'message': 'vertex error: Publisher Model `projects/truefoundry-devtest/locations/global/publishers/google/models/translate-llm` was not found or your project does not have access to it. Please ensure you are using a valid model version. For more information, see: https://cloud.google.com/vertex-ai/generative-ai/docs/learn/model-versions', 'type': 'APIError', 'code': '404'}, 'error_origin_level': 'api_error', 'provider': 'google-vertex'}

Code snippet

from openai import OpenAI

client = OpenAI(api_key="***", base_url="https://internal.devtest.truefoundry.tech/api/llm")

response = client.chat.completions.create(
    model="test-v2-vertex/google-translate-llm",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Hi"},
        {"role": "assistant", "content": "Hi, how can I help you"},
        {"role": "user", "content": "What is the capital of France?"},
    ],
    max_tokens=256,
    temperature=0.7,
    stream=True,
)

for chunk in response:
    if chunk.choices and len(chunk.choices) > 0:
        delta = chunk.choices[0].delta
        if delta.content is not None:
            print(delta.content, end="", flush=True)

google-vertex/google/translate-llm — params (failure)

Error:

Traceback (most recent call last):
  File "/tmp/tmpbohnv2gm/snippet.py", line 5, in <module>
    response = client.chat.completions.create(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_utils/_utils.py", line 286, in wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/resources/chat/completions/completions.py", line 1147, in create
    return self._post(
           ^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 1259, in post
    return cast(ResponseT, self.request(cast_to, opts, stream=stream, stream_cls=stream_cls))
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 1047, in request
    raise self._make_status_error_from_response(err.response) from None
openai.NotFoundError: Error code: 404 - {'status': 'failure', 'message': 'vertex error: Publisher Model `projects/truefoundry-devtest/locations/global/publishers/google/models/translate-llm` was not found or your project does not have access to it. Please ensure you are using a valid model version. For more information, see: https://cloud.google.com/vertex-ai/generative-ai/docs/learn/model-versions', 'error': {'message': 'vertex error: Publisher Model `projects/truefoundry-devtest/locations/global/publishers/google/models/translate-llm` was not found or your project does not have access to it. Please ensure you are using a valid model version. For more information, see: https://cloud.google.com/vertex-ai/generative-ai/docs/learn/model-versions', 'type': 'APIError', 'code': '404'}, 'error_origin_level': 'api_error', 'provider': 'google-vertex'}

Code snippet

from openai import OpenAI

client = OpenAI(api_key="***", base_url="https://internal.devtest.truefoundry.tech/api/llm")

response = client.chat.completions.create(
    model="test-v2-vertex/google-translate-llm",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Hi"},
        {"role": "assistant", "content": "Hi, how can I help you"},
        {"role": "user", "content": "What is the capital of France?"},
    ],
    max_tokens=256,
    temperature=0.7,
    stream=False,
)

print(response.choices[0].message.content)

google-vertex/openai/gpt-oss — structured-output:stream (failure)

Error:

Traceback (most recent call last):
  File "/tmp/tmpprqc_wqv/snippet.py", line 21, in <module>
    response = client.chat.completions.create(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_utils/_utils.py", line 286, in wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/resources/chat/completions/completions.py", line 1147, in create
    return self._post(
           ^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 1259, in post
    return cast(ResponseT, self.request(cast_to, opts, stream=stream, stream_cls=stream_cls))
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 1047, in request
    raise self._make_status_error_from_response(err.response) from None
openai.NotFoundError: Error code: 404 - {'status': 'failure', 'message': 'vertex error: Publisher Model `projects/248190060486/locations/global/publishers/openai/models/gpt-oss` was not found or your project does not have access to it. Please ensure you are using a valid model version. For more information, see: https://cloud.google.com/vertex-ai/generative-ai/docs/learn/model-versions', 'error': {'message': 'vertex error: Publisher Model `projects/248190060486/locations/global/publishers/openai/models/gpt-oss` was not found or your project does not have access to it. Please ensure you are using a valid model version. For more information, see: https://cloud.google.com/vertex-ai/generative-ai/docs/learn/model-versions', 'type': 'APIError', 'code': '404'}, 'error_origin_level': 'api_error', 'provider': 'google-vertex'}

Code snippet

from openai import OpenAI
import json

client = OpenAI(api_key="***", base_url="https://internal.devtest.truefoundry.tech/api/llm")

response_schema = json.loads('''{
  "title": "CalendarEvent",
  "type": "object",
  "properties": {
    "name": { "type": "string" },
    "date": { "type": "string" },
    "participants": {
      "type": "array",
      "items": { "type": "string" }
    }
  },
  "required": ["name", "date", "participants"],
  "additionalProperties": false
}''')

response = client.chat.completions.create(
    model="test-v2-vertex/openai-gpt-oss",
    messages=[
        {"role": "system", "content": "Extract the event information as JSON."},
        {"role": "user", "content": "Hi"},
        {"role": "assistant", "content": "Hi, how can I help you"},
        {"role": "user", "content": "Alice and Bob are going to a science fair on Friday. Extract the event details as JSON."},
    ],
    response_format={"type": "json_schema", "json_schema": {"name": "CalendarEvent", "schema": response_schema}},
    stream=True,
)

import json as _json

_accumulated = ""
for chunk in response:
    if chunk.choices and len(chunk.choices) > 0:
        delta = chunk.choices[0].delta
        if delta.content is not None:
            _accumulated += delta.content
            print(delta.content, end="", flush=True)

if not _accumulated:
    raise Exception("VALIDATION FAILED: structured-output stream - no content received")

_parsed = _json.loads(_accumulated)

if "name" not in _parsed or "date" not in _parsed or "participants" not in _parsed:
    raise Exception("VALIDATION FAILED: structured-output stream - missing expected fields (name, date, participants)")

if not isinstance(_parsed.get("participants"), list):
    raise Exception("VALIDATION FAILED: structured-output stream - 'participants' is not a list, schema not enforced")

if set(_parsed.keys()) != {"name", "date", "participants"}:
    raise Exception(
        f"VALIDATION FAILED: structured-output stream - unexpected keys present: {set(_parsed.keys())}"
    )

print("\nVALIDATION: structured-output stream SUCCESS")

google-vertex/openai/gpt-oss — structured-output (failure)

Error:

Traceback (most recent call last):
  File "/tmp/tmphbinzr_g/snippet.py", line 21, in <module>
    response = client.chat.completions.create(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_utils/_utils.py", line 286, in wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/resources/chat/completions/completions.py", line 1147, in create
    return self._post(
           ^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 1259, in post
    return cast(ResponseT, self.request(cast_to, opts, stream=stream, stream_cls=stream_cls))
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 1047, in request
    raise self._make_status_error_from_response(err.response) from None
openai.NotFoundError: Error code: 404 - {'status': 'failure', 'message': 'vertex error: Publisher Model `projects/248190060486/locations/global/publishers/openai/models/gpt-oss` was not found or your project does not have access to it. Please ensure you are using a valid model version. For more information, see: https://cloud.google.com/vertex-ai/generative-ai/docs/learn/model-versions', 'error': {'message': 'vertex error: Publisher Model `projects/248190060486/locations/global/publishers/openai/models/gpt-oss` was not found or your project does not have access to it. Please ensure you are using a valid model version. For more information, see: https://cloud.google.com/vertex-ai/generative-ai/docs/learn/model-versions', 'type': 'APIError', 'code': '404'}, 'error_origin_level': 'api_error', 'provider': 'google-vertex'}

Code snippet

from openai import OpenAI
import json

client = OpenAI(api_key="***", base_url="https://internal.devtest.truefoundry.tech/api/llm")

response_schema = json.loads('''{
  "title": "CalendarEvent",
  "type": "object",
  "properties": {
    "name": { "type": "string" },
    "date": { "type": "string" },
    "participants": {
      "type": "array",
      "items": { "type": "string" }
    }
  },
  "required": ["name", "date", "participants"],
  "additionalProperties": false
}''')

response = client.chat.completions.create(
    model="test-v2-vertex/openai-gpt-oss",
    messages=[
        {"role": "system", "content": "Extract the event information as JSON."},
        {"role": "user", "content": "Hi"},
        {"role": "assistant", "content": "Hi, how can I help you"},
        {"role": "user", "content": "Alice and Bob are going to a science fair on Friday. Extract the event details as JSON."},
    ],
    response_format={"type": "json_schema", "json_schema": {"name": "CalendarEvent", "schema": response_schema}},
    stream=False,
)

import json as _json

_content = response.choices[0].message.content
print(_content)

if not _content:
    raise Exception("VALIDATION FAILED: structured-output - response content is empty")

_parsed = _json.loads(_content)

if "name" not in _parsed or "date" not in _parsed or "participants" not in _parsed:
    raise Exception("VALIDATION FAILED: structured-output - missing expected fields (name, date, participants)")

if not isinstance(_parsed.get("participants"), list):
    raise Exception("VALIDATION FAILED: structured-output - 'participants' is not a list, schema not enforced")

if set(_parsed.keys()) != {"name", "date", "participants"}:
    raise Exception(
        f"VALIDATION FAILED: structured-output - unexpected keys present: {set(_parsed.keys())}"
    )

print("VALIDATION: structured-output SUCCESS")

google-vertex/openai/gpt-oss — reasoning:stream (failure)

Error:

Traceback (most recent call last):
  File "/tmp/tmps9ui8snw/snippet.py", line 5, in <module>
    response = client.chat.completions.create(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_utils/_utils.py", line 286, in wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/resources/chat/completions/completions.py", line 1147, in create
    return self._post(
           ^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 1259, in post
    return cast(ResponseT, self.request(cast_to, opts, stream=stream, stream_cls=stream_cls))
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 1047, in request
    raise self._make_status_error_from_response(err.response) from None
openai.NotFoundError: Error code: 404 - {'status': 'failure', 'message': 'vertex error: Publisher Model `projects/248190060486/locations/global/publishers/openai/models/gpt-oss` was not found or your project does not have access to it. Please ensure you are using a valid model version. For more information, see: https://cloud.google.com/vertex-ai/generative-ai/docs/learn/model-versions', 'error': {'message': 'vertex error: Publisher Model `projects/248190060486/locations/global/publishers/openai/models/gpt-oss` was not found or your project does not have access to it. Please ensure you are using a valid model version. For more information, see: https://cloud.google.com/vertex-ai/generative-ai/docs/learn/model-versions', 'type': 'APIError', 'code': '404'}, 'error_origin_level': 'api_error', 'provider': 'google-vertex'}

Code snippet

from openai import OpenAI

client = OpenAI(api_key="***", base_url="https://internal.devtest.truefoundry.tech/api/llm")

response = client.chat.completions.create(
    model="test-v2-vertex/openai-gpt-oss",
    messages=[
        {"role": "system", "content": "You are a helpful assistant. You MUST think step by step and show your reasoning. Never skip reasoning steps."},
        {"role": "user", "content": "Hi"},
        {"role": "assistant", "content": "Hi, how can I help you"},
        {"role": "user", "content": "How to calculate 3^3^3^3? Think step by step and show all reasoning."},
    ],
    reasoning_effort="medium",
    stream=True,
)

_reasoning_detected = False
for chunk in response:
    if chunk.choices and len(chunk.choices) > 0:
        delta = chunk.choices[0].delta
        if delta.content is not None:
            print(delta.content, end="", flush=True)
        if getattr(delta, "reasoning_content", None) is not None:
            _reasoning_detected = True
        if getattr(delta, "reasoning", None) is not None:
            _reasoning_detected = True

    _usage = getattr(chunk, "usage", None)
    if _usage is not None:
        _details = getattr(_usage, "completion_tokens_details", None)
        if _details and getattr(_details, "reasoning_tokens", 0) > 0:
            _reasoning_detected = True

if not _reasoning_detected:
    raise Exception("VALIDATION FAILED: reasoning stream - no reasoning information in stream")
print("\nVALIDATION: reasoning stream SUCCESS")

google-vertex/openai/gpt-oss — params:stream (failure)

Error:

Traceback (most recent call last):
  File "/tmp/tmpb_nqquop/snippet.py", line 5, in <module>
    response = client.chat.completions.create(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_utils/_utils.py", line 286, in wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/resources/chat/completions/completions.py", line 1147, in create
    return self._post(
           ^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 1259, in post
    return cast(ResponseT, self.request(cast_to, opts, stream=stream, stream_cls=stream_cls))
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 1047, in request
    raise self._make_status_error_from_response(err.response) from None
openai.NotFoundError: Error code: 404 - {'status': 'failure', 'message': 'vertex error: Publisher Model `projects/248190060486/locations/global/publishers/openai/models/gpt-oss` was not found or your project does not have access to it. Please ensure you are using a valid model version. For more information, see: https://cloud.google.com/vertex-ai/generative-ai/docs/learn/model-versions', 'error': {'message': 'vertex error: Publisher Model `projects/248190060486/locations/global/publishers/openai/models/gpt-oss` was not found or your project does not have access to it. Please ensure you are using a valid model version. For more information, see: https://cloud.google.com/vertex-ai/generative-ai/docs/learn/model-versions', 'type': 'APIError', 'code': '404'}, 'error_origin_level': 'api_error', 'provider': 'google-vertex'}

Code snippet

from openai import OpenAI

client = OpenAI(api_key="***", base_url="https://internal.devtest.truefoundry.tech/api/llm")

response = client.chat.completions.create(
    model="test-v2-vertex/openai-gpt-oss",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Hi"},
        {"role": "assistant", "content": "Hi, how can I help you"},
        {"role": "user", "content": "What is the capital of France?"},
    ],
    max_tokens=256,
    temperature=0.7,
    stream=True,
)

for chunk in response:
    if chunk.choices and len(chunk.choices) > 0:
        delta = chunk.choices[0].delta
        if delta.content is not None:
            print(delta.content, end="", flush=True)

google-vertex/openai/gpt-oss — params (failure)

Error:

Traceback (most recent call last):
  File "/tmp/tmpzvqjp_rq/snippet.py", line 5, in <module>
    response = client.chat.completions.create(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_utils/_utils.py", line 286, in wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/resources/chat/completions/completions.py", line 1147, in create
    return self._post(
           ^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 1259, in post
    return cast(ResponseT, self.request(cast_to, opts, stream=stream, stream_cls=stream_cls))
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 1047, in request
    raise self._make_status_error_from_response(err.response) from None
openai.NotFoundError: Error code: 404 - {'status': 'failure', 'message': 'vertex error: Publisher Model `projects/248190060486/locations/global/publishers/openai/models/gpt-oss` was not found or your project does not have access to it. Please ensure you are using a valid model version. For more information, see: https://cloud.google.com/vertex-ai/generative-ai/docs/learn/model-versions', 'error': {'message': 'vertex error: Publisher Model `projects/248190060486/locations/global/publishers/openai/models/gpt-oss` was not found or your project does not have access to it. Please ensure you are using a valid model version. For more information, see: https://cloud.google.com/vertex-ai/generative-ai/docs/learn/model-versions', 'type': 'APIError', 'code': '404'}, 'error_origin_level': 'api_error', 'provider': 'google-vertex'}

Code snippet

from openai import OpenAI

client = OpenAI(api_key="***", base_url="https://internal.devtest.truefoundry.tech/api/llm")

response = client.chat.completions.create(
    model="test-v2-vertex/openai-gpt-oss",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Hi"},
        {"role": "assistant", "content": "Hi, how can I help you"},
        {"role": "user", "content": "What is the capital of France?"},
    ],
    max_tokens=256,
    temperature=0.7,
    stream=False,
)

print(response.choices[0].message.content)

google-vertex/openai/gpt-oss — reasoning (failure)

Error:

Traceback (most recent call last):
  File "/tmp/tmpr15o92w7/snippet.py", line 5, in <module>
    response = client.chat.completions.create(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_utils/_utils.py", line 286, in wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/resources/chat/completions/completions.py", line 1147, in create
    return self._post(
           ^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 1259, in post
    return cast(ResponseT, self.request(cast_to, opts, stream=stream, stream_cls=stream_cls))
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 1047, in request
    raise self._make_status_error_from_response(err.response) from None
openai.NotFoundError: Error code: 404 - {'status': 'failure', 'message': 'vertex error: Publisher Model `projects/248190060486/locations/global/publishers/openai/models/gpt-oss` was not found or your project does not have access to it. Please ensure you are using a valid model version. For more information, see: https://cloud.google.com/vertex-ai/generative-ai/docs/learn/model-versions', 'error': {'message': 'vertex error: Publisher Model `projects/248190060486/locations/global/publishers/openai/models/gpt-oss` was not found or your project does not have access to it. Please ensure you are using a valid model version. For more information, see: https://cloud.google.com/vertex-ai/generative-ai/docs/learn/model-versions', 'type': 'APIError', 'code': '404'}, 'error_origin_level': 'api_error', 'provider': 'google-vertex'}

Code snippet

from openai import OpenAI

client = OpenAI(api_key="***", base_url="https://internal.devtest.truefoundry.tech/api/llm")

response = client.chat.completions.create(
    model="test-v2-vertex/openai-gpt-oss",
    messages=[
        {"role": "system", "content": "You are a helpful assistant. You MUST think step by step and show your reasoning. Never skip reasoning steps."},
        {"role": "user", "content": "Hi"},
        {"role": "assistant", "content": "Hi, how can I help you"},
        {"role": "user", "content": "How to calculate 3^3^3^3? Think step by step and show all reasoning."},
    ],
    reasoning_effort="medium",
    stream=False,
)

_usage = getattr(response, "usage", None)
_reasoning_detected = False

_choices = getattr(response, "choices", None)
if _choices and len(_choices) > 0:
    _message = getattr(_choices[0], "message", None)
else:
    _message = None

if _message and getattr(_message, "content", None) is not None:
    print(_message.content)

if _usage is not None:
    _output_token_details = getattr(_usage, "completion_tokens_details", None)
    if _output_token_details and getattr(_output_token_details, "reasoning_tokens", 0) > 0:
        _reasoning_detected = True
    elif getattr(_usage, "reasoning", None) is not None:
        _reasoning_detected = True

if getattr(_message, "reasoning_content", None) is not None:
    _reasoning_detected = True
elif getattr(_message, "reasoning", None) is not None:
    _reasoning_detected = True

if not _reasoning_detected:
    print("Response: ", response)
    raise Exception("VALIDATION FAILED: reasoning - no reasoning information in response")
print("VALIDATION: reasoning SUCCESS")

google-vertex/openai/gpt-oss — tool-call (failure)

Error:

Traceback (most recent call last):
  File "/tmp/tmph6xuh22i/snippet.py", line 27, in <module>
    response = client.chat.completions.create(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_utils/_utils.py", line 286, in wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/resources/chat/completions/completions.py", line 1147, in create
    return self._post(
           ^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 1259, in post
    return cast(ResponseT, self.request(cast_to, opts, stream=stream, stream_cls=stream_cls))
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 1047, in request
    raise self._make_status_error_from_response(err.response) from None
openai.NotFoundError: Error code: 404 - {'status': 'failure', 'message': 'vertex error: Publisher Model `projects/248190060486/locations/global/publishers/openai/models/gpt-oss` was not found or your project does not have access to it. Please ensure you are using a valid model version. For more information, see: https://cloud.google.com/vertex-ai/generative-ai/docs/learn/model-versions', 'error': {'message': 'vertex error: Publisher Model `projects/248190060486/locations/global/publishers/openai/models/gpt-oss` was not found or your project does not have access to it. Please ensure you are using a valid model version. For more information, see: https://cloud.google.com/vertex-ai/generative-ai/docs/learn/model-versions', 'type': 'APIError', 'code': '404'}, 'error_origin_level': 'api_error', 'provider': 'google-vertex'}

Code snippet

from openai import OpenAI

client = OpenAI(api_key="***", base_url="https://internal.devtest.truefoundry.tech/api/llm")

tools = [
    {
        "type": "function",
        "function": {
            "name": "get_weather",
            "description": "Get the current weather for a location.",
            "parameters": {
                "type": "object",
                "properties": {
                    "location": {
                        "type": "string",
                        "description": "The city name, e.g. London",
                    },
                },
                "required": ["location"],
                "additionalProperties": False,
            },
            "strict": True,
        },
    },
]

response = client.chat.completions.create(
    model="test-v2-vertex/openai-gpt-oss",
    messages=[
        {"role": "system", "content": "You are a helpful assistant with access to tools. You MUST strictly use the provided tools to answer. Never respond with plain text when a tool is available."},
        {"role": "user", "content": "Hi"},
        {"role": "assistant", "content": "Hi, how can I help you"},
        {"role": "user", "content": "Use the get_weather tool to check the weather in London. You must call the tool, do not respond with plain text."},
    ],
    tools=tools,
    tool_choice="auto",
    stream=False,
)

_message = response.choices[0].message
if _message.tool_calls:
    for _tc in _message.tool_calls:
        print(f"Function: {_tc.function.name}")
        print(f"Arguments: {_tc.function.arguments}")
else:
    print(_message.content)

if not _message.tool_calls or len(_message.tool_calls) == 0:
    raise Exception("VALIDATION FAILED: tool-call - no tool calls in response")
print("VALIDATION: tool-call SUCCESS")

google-vertex/openai/gpt-oss — tool-call:stream (failure)

Error:

Traceback (most recent call last):
  File "/tmp/tmpe9na7jvx/snippet.py", line 27, in <module>
    response = client.chat.completions.create(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_utils/_utils.py", line 286, in wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/resources/chat/completions/completions.py", line 1147, in create
    return self._post(
           ^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 1259, in post
    return cast(ResponseT, self.request(cast_to, opts, stream=stream, stream_cls=stream_cls))
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 1047, in request
    raise self._make_status_error_from_response(err.response) from None
openai.NotFoundError: Error code: 404 - {'status': 'failure', 'message': 'vertex error: Publisher Model `projects/248190060486/locations/global/publishers/openai/models/gpt-oss` was not found or your project does not have access to it. Please ensure you are using a valid model version. For more information, see: https://cloud.google.com/vertex-ai/generative-ai/docs/learn/model-versions', 'error': {'message': 'vertex error: Publisher Model `projects/248190060486/locations/global/publishers/openai/models/gpt-oss` was not found or your project does not have access to it. Please ensure you are using a valid model version. For more information, see: https://cloud.google.com/vertex-ai/generative-ai/docs/learn/model-versions', 'type': 'APIError', 'code': '404'}, 'error_origin_level': 'api_error', 'provider': 'google-vertex'}

Code snippet

from openai import OpenAI

client = OpenAI(api_key="***", base_url="https://internal.devtest.truefoundry.tech/api/llm")

tools = [
    {
        "type": "function",
        "function": {
            "name": "get_weather",
            "description": "Get the current weather for a location.",
            "parameters": {
                "type": "object",
                "properties": {
                    "location": {
                        "type": "string",
                        "description": "The city name, e.g. London",
                    },
                },
                "required": ["location"],
                "additionalProperties": False,
            },
            "strict": True,
        },
    },
]

response = client.chat.completions.create(
    model="test-v2-vertex/openai-gpt-oss",
    messages=[
        {"role": "system", "content": "You are a helpful assistant with access to tools. You MUST strictly use the provided tools to answer. Never respond with plain text when a tool is available."},
        {"role": "user", "content": "Hi"},
        {"role": "assistant", "content": "Hi, how can I help you"},
        {"role": "user", "content": "Use the get_weather tool to check the weather in London. You must call the tool, do not respond with plain text."},
    ],
    tools=tools,
    tool_choice="auto",
    stream=True,
)

_tool_calls_made = False
for chunk in response:
    if chunk.choices and len(chunk.choices) > 0:
        delta = chunk.choices[0].delta
        if delta.content is not None:
            print(delta.content, end="", flush=True)
        if delta.tool_calls:
            _tool_calls_made = True
            for _tc in delta.tool_calls:
                if _tc.function:
                    print(_tc.function.arguments or "", end="", flush=True)

if not _tool_calls_made:
    raise Exception("VALIDATION FAILED: tool-call stream - no tool calls received")
print("\nVALIDATION: tool-call stream SUCCESS")

google-vertex/zai-org/glm-4.7 — tool-call (failure)

Error:

Traceback (most recent call last):
  File "/tmp/tmp__j1nz19/snippet.py", line 27, in <module>
    response = client.chat.completions.create(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_utils/_utils.py", line 286, in wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/resources/chat/completions/completions.py", line 1147, in create
    return self._post(
           ^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 1259, in post
    return cast(ResponseT, self.request(cast_to, opts, stream=stream, stream_cls=stream_cls))
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 1047, in request
    raise self._make_status_error_from_response(err.response) from None
openai.NotFoundError: Error code: 404 - {'status': 'failure', 'message': 'vertex error: Publisher Model `projects/248190060486/locations/global/publishers/zai-org/models/glm-4.7` was not found or your project does not have access to it. Please ensure you are using a valid model version. For more information, see: https://cloud.google.com/vertex-ai/generative-ai/docs/learn/model-versions', 'error': {'message': 'vertex error: Publisher Model `projects/248190060486/locations/global/publishers/zai-org/models/glm-4.7` was not found or your project does not have access to it. Please ensure you are using a valid model version. For more information, see: https://cloud.google.com/vertex-ai/generative-ai/docs/learn/model-versions', 'type': 'APIError', 'code': '404'}, 'error_origin_level': 'api_error', 'provider': 'google-vertex'}

Code snippet

from openai import OpenAI

client = OpenAI(api_key="***", base_url="https://internal.devtest.truefoundry.tech/api/llm")

tools = [
    {
        "type": "function",
        "function": {
            "name": "get_weather",
            "description": "Get the current weather for a location.",
            "parameters": {
                "type": "object",
                "properties": {
                    "location": {
                        "type": "string",
                        "description": "The city name, e.g. London",
                    },
                },
                "required": ["location"],
                "additionalProperties": False,
            },
            "strict": True,
        },
    },
]

response = client.chat.completions.create(
    model="test-v2-vertex/zai-org-glm-4.7",
    messages=[
        {"role": "system", "content": "You are a helpful assistant with access to tools. You MUST strictly use the provided tools to answer. Never respond with plain text when a tool is available."},
        {"role": "user", "content": "Hi"},
        {"role": "assistant", "content": "Hi, how can I help you"},
        {"role": "user", "content": "Use the get_weather tool to check the weather in London. You must call the tool, do not respond with plain text."},
    ],
    tools=tools,
    tool_choice="auto",
    stream=False,
)

_message = response.choices[0].message
if _message.tool_calls:
    for _tc in _message.tool_calls:
        print(f"Function: {_tc.function.name}")
        print(f"Arguments: {_tc.function.arguments}")
else:
    print(_message.content)

if not _message.tool_calls or len(_message.tool_calls) == 0:
    raise Exception("VALIDATION FAILED: tool-call - no tool calls in response")
print("VALIDATION: tool-call SUCCESS")

google-vertex/zai-org/glm-4.7 — structured-output:stream (failure)

Error:

Traceback (most recent call last):
  File "/tmp/tmpq8ga2wh5/snippet.py", line 21, in <module>
    response = client.chat.completions.create(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_utils/_utils.py", line 286, in wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/resources/chat/completions/completions.py", line 1147, in create
    return self._post(
           ^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 1259, in post
    return cast(ResponseT, self.request(cast_to, opts, stream=stream, stream_cls=stream_cls))
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 1047, in request
    raise self._make_status_error_from_response(err.response) from None
openai.NotFoundError: Error code: 404 - {'status': 'failure', 'message': 'vertex error: Publisher Model `projects/248190060486/locations/global/publishers/zai-org/models/glm-4.7` was not found or your project does not have access to it. Please ensure you are using a valid model version. For more information, see: https://cloud.google.com/vertex-ai/generative-ai/docs/learn/model-versions', 'error': {'message': 'vertex error: Publisher Model `projects/248190060486/locations/global/publishers/zai-org/models/glm-4.7` was not found or your project does not have access to it. Please ensure you are using a valid model version. For more information, see: https://cloud.google.com/vertex-ai/generative-ai/docs/learn/model-versions', 'type': 'APIError', 'code': '404'}, 'error_origin_level': 'api_error', 'provider': 'google-vertex'}

Code snippet

from openai import OpenAI
import json

client = OpenAI(api_key="***", base_url="https://internal.devtest.truefoundry.tech/api/llm")

response_schema = json.loads('''{
  "title": "CalendarEvent",
  "type": "object",
  "properties": {
    "name": { "type": "string" },
    "date": { "type": "string" },
    "participants": {
      "type": "array",
      "items": { "type": "string" }
    }
  },
  "required": ["name", "date", "participants"],
  "additionalProperties": false
}''')

response = client.chat.completions.create(
    model="test-v2-vertex/zai-org-glm-4.7",
    messages=[
        {"role": "system", "content": "Extract the event information as JSON."},
        {"role": "user", "content": "Hi"},
        {"role": "assistant", "content": "Hi, how can I help you"},
        {"role": "user", "content": "Alice and Bob are going to a science fair on Friday. Extract the event details as JSON."},
    ],
    response_format={"type": "json_schema", "json_schema": {"name": "CalendarEvent", "schema": response_schema}},
    stream=True,
)

import json as _json

_accumulated = ""
for chunk in response:
    if chunk.choices and len(chunk.choices) > 0:
        delta = chunk.choices[0].delta
        if delta.content is not None:
            _accumulated += delta.content
            print(delta.content, end="", flush=True)

if not _accumulated:
    raise Exception("VALIDATION FAILED: structured-output stream - no content received")

_parsed = _json.loads(_accumulated)

if "name" not in _parsed or "date" not in _parsed or "participants" not in _parsed:
    raise Exception("VALIDATION FAILED: structured-output stream - missing expected fields (name, date, participants)")

if not isinstance(_parsed.get("participants"), list):
    raise Exception("VALIDATION FAILED: structured-output stream - 'participants' is not a list, schema not enforced")

if set(_parsed.keys()) != {"name", "date", "participants"}:
    raise Exception(
        f"VALIDATION FAILED: structured-output stream - unexpected keys present: {set(_parsed.keys())}"
    )

print("\nVALIDATION: structured-output stream SUCCESS")

google-vertex/zai-org/glm-4.7 — tool-call:stream (failure)

Error:

Traceback (most recent call last):
  File "/tmp/tmphrl7b6cs/snippet.py", line 27, in <module>
    response = client.chat.completions.create(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_utils/_utils.py", line 286, in wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/resources/chat/completions/completions.py", line 1147, in create
    return self._post(
           ^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 1259, in post
    return cast(ResponseT, self.request(cast_to, opts, stream=stream, stream_cls=stream_cls))
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 1047, in request
    raise self._make_status_error_from_response(err.response) from None
openai.NotFoundError: Error code: 404 - {'status': 'failure', 'message': 'vertex error: Publisher Model `projects/248190060486/locations/global/publishers/zai-org/models/glm-4.7` was not found or your project does not have access to it. Please ensure you are using a valid model version. For more information, see: https://cloud.google.com/vertex-ai/generative-ai/docs/learn/model-versions', 'error': {'message': 'vertex error: Publisher Model `projects/248190060486/locations/global/publishers/zai-org/models/glm-4.7` was not found or your project does not have access to it. Please ensure you are using a valid model version. For more information, see: https://cloud.google.com/vertex-ai/generative-ai/docs/learn/model-versions', 'type': 'APIError', 'code': '404'}, 'error_origin_level': 'api_error', 'provider': 'google-vertex'}

Code snippet

from openai import OpenAI

client = OpenAI(api_key="***", base_url="https://internal.devtest.truefoundry.tech/api/llm")

tools = [
    {
        "type": "function",
        "function": {
            "name": "get_weather",
            "description": "Get the current weather for a location.",
            "parameters": {
                "type": "object",
                "properties": {
                    "location": {
                        "type": "string",
                        "description": "The city name, e.g. London",
                    },
                },
                "required": ["location"],
                "additionalProperties": False,
            },
            "strict": True,
        },
    },
]

response = client.chat.completions.create(
    model="test-v2-vertex/zai-org-glm-4.7",
    messages=[
        {"role": "system", "content": "You are a helpful assistant with access to tools. You MUST strictly use the provided tools to answer. Never respond with plain text when a tool is available."},
        {"role": "user", "content": "Hi"},
        {"role": "assistant", "content": "Hi, how can I help you"},
        {"role": "user", "content": "Use the get_weather tool to check the weather in London. You must call the tool, do not respond with plain text."},
    ],
    tools=tools,
    tool_choice="auto",
    stream=True,
)

_tool_calls_made = False
for chunk in response:
    if chunk.choices and len(chunk.choices) > 0:
        delta = chunk.choices[0].delta
        if delta.content is not None:
            print(delta.content, end="", flush=True)
        if delta.tool_calls:
            _tool_calls_made = True
            for _tc in delta.tool_calls:
                if _tc.function:
                    print(_tc.function.arguments or "", end="", flush=True)

if not _tool_calls_made:
    raise Exception("VALIDATION FAILED: tool-call stream - no tool calls received")
print("\nVALIDATION: tool-call stream SUCCESS")

google-vertex/zai-org/glm-4.7 — structured-output (failure)

Error:

Traceback (most recent call last):
  File "/tmp/tmp8_p8kh10/snippet.py", line 21, in <module>
    response = client.chat.completions.create(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_utils/_utils.py", line 286, in wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/resources/chat/completions/completions.py", line 1147, in create
    return self._post(
           ^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 1259, in post
    return cast(ResponseT, self.request(cast_to, opts, stream=stream, stream_cls=stream_cls))
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 1047, in request
    raise self._make_status_error_from_response(err.response) from None
openai.NotFoundError: Error code: 404 - {'status': 'failure', 'message': 'vertex error: Publisher Model `projects/248190060486/locations/global/publishers/zai-org/models/glm-4.7` was not found or your project does not have access to it. Please ensure you are using a valid model version. For more information, see: https://cloud.google.com/vertex-ai/generative-ai/docs/learn/model-versions', 'error': {'message': 'vertex error: Publisher Model `projects/248190060486/locations/global/publishers/zai-org/models/glm-4.7` was not found or your project does not have access to it. Please ensure you are using a valid model version. For more information, see: https://cloud.google.com/vertex-ai/generative-ai/docs/learn/model-versions', 'type': 'APIError', 'code': '404'}, 'error_origin_level': 'api_error', 'provider': 'google-vertex'}

Code snippet

from openai import OpenAI
import json

client = OpenAI(api_key="***", base_url="https://internal.devtest.truefoundry.tech/api/llm")

response_schema = json.loads('''{
  "title": "CalendarEvent",
  "type": "object",
  "properties": {
    "name": { "type": "string" },
    "date": { "type": "string" },
    "participants": {
      "type": "array",
      "items": { "type": "string" }
    }
  },
  "required": ["name", "date", "participants"],
  "additionalProperties": false
}''')

response = client.chat.completions.create(
    model="test-v2-vertex/zai-org-glm-4.7",
    messages=[
        {"role": "system", "content": "Extract the event information as JSON."},
        {"role": "user", "content": "Hi"},
        {"role": "assistant", "content": "Hi, how can I help you"},
        {"role": "user", "content": "Alice and Bob are going to a science fair on Friday. Extract the event details as JSON."},
    ],
    response_format={"type": "json_schema", "json_schema": {"name": "CalendarEvent", "schema": response_schema}},
    stream=False,
)

import json as _json

_content = response.choices[0].message.content
print(_content)

if not _content:
    raise Exception("VALIDATION FAILED: structured-output - response content is empty")

_parsed = _json.loads(_content)

if "name" not in _parsed or "date" not in _parsed or "participants" not in _parsed:
    raise Exception("VALIDATION FAILED: structured-output - missing expected fields (name, date, participants)")

if not isinstance(_parsed.get("participants"), list):
    raise Exception("VALIDATION FAILED: structured-output - 'participants' is not a list, schema not enforced")

if set(_parsed.keys()) != {"name", "date", "participants"}:
    raise Exception(
        f"VALIDATION FAILED: structured-output - unexpected keys present: {set(_parsed.keys())}"
    )

print("VALIDATION: structured-output SUCCESS")

google-vertex/zai-org/glm-4.7 — params:stream (failure)

Error:

Traceback (most recent call last):
  File "/tmp/tmpvq37v34t/snippet.py", line 5, in <module>
    response = client.chat.completions.create(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_utils/_utils.py", line 286, in wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/resources/chat/completions/completions.py", line 1147, in create
    return self._post(
           ^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 1259, in post
    return cast(ResponseT, self.request(cast_to, opts, stream=stream, stream_cls=stream_cls))
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 1047, in request
    raise self._make_status_error_from_response(err.response) from None
openai.NotFoundError: Error code: 404 - {'status': 'failure', 'message': 'vertex error: Publisher Model `projects/248190060486/locations/global/publishers/zai-org/models/glm-4.7` was not found or your project does not have access to it. Please ensure you are using a valid model version. For more information, see: https://cloud.google.com/vertex-ai/generative-ai/docs/learn/model-versions', 'error': {'message': 'vertex error: Publisher Model `projects/248190060486/locations/global/publishers/zai-org/models/glm-4.7` was not found or your project does not have access to it. Please ensure you are using a valid model version. For more information, see: https://cloud.google.com/vertex-ai/generative-ai/docs/learn/model-versions', 'type': 'APIError', 'code': '404'}, 'error_origin_level': 'api_error', 'provider': 'google-vertex'}

Code snippet

from openai import OpenAI

client = OpenAI(api_key="***", base_url="https://internal.devtest.truefoundry.tech/api/llm")

response = client.chat.completions.create(
    model="test-v2-vertex/zai-org-glm-4.7",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Hi"},
        {"role": "assistant", "content": "Hi, how can I help you"},
        {"role": "user", "content": "What is the capital of France?"},
    ],
    max_tokens=256,
    temperature=0.7,
    stream=True,
)

for chunk in response:
    if chunk.choices and len(chunk.choices) > 0:
        delta = chunk.choices[0].delta
        if delta.content is not None:
            print(delta.content, end="", flush=True)

google-vertex/zai-org/glm-4.7 — params (failure)

Error:

Traceback (most recent call last):
  File "/tmp/tmpukdw2258/snippet.py", line 5, in <module>
    response = client.chat.completions.create(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_utils/_utils.py", line 286, in wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/resources/chat/completions/completions.py", line 1147, in create
    return self._post(
           ^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 1259, in post
    return cast(ResponseT, self.request(cast_to, opts, stream=stream, stream_cls=stream_cls))
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 1047, in request
    raise self._make_status_error_from_response(err.response) from None
openai.NotFoundError: Error code: 404 - {'status': 'failure', 'message': 'vertex error: Publisher Model `projects/248190060486/locations/global/publishers/zai-org/models/glm-4.7` was not found or your project does not have access to it. Please ensure you are using a valid model version. For more information, see: https://cloud.google.com/vertex-ai/generative-ai/docs/learn/model-versions', 'error': {'message': 'vertex error: Publisher Model `projects/248190060486/locations/global/publishers/zai-org/models/glm-4.7` was not found or your project does not have access to it. Please ensure you are using a valid model version. For more information, see: https://cloud.google.com/vertex-ai/generative-ai/docs/learn/model-versions', 'type': 'APIError', 'code': '404'}, 'error_origin_level': 'api_error', 'provider': 'google-vertex'}

Code snippet

from openai import OpenAI

client = OpenAI(api_key="***", base_url="https://internal.devtest.truefoundry.tech/api/llm")

response = client.chat.completions.create(
    model="test-v2-vertex/zai-org-glm-4.7",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Hi"},
        {"role": "assistant", "content": "Hi, how can I help you"},
        {"role": "user", "content": "What is the capital of France?"},
    ],
    max_tokens=256,
    temperature=0.7,
    stream=False,
)

print(response.choices[0].message.content)

Skipped (25)

google-vertex/anthropic/claude-opus-4@20250514 — skip-check (skipped)

Skip reason:

deprecated or retired model

google-vertex/anthropic/claude-sonnet-4@20250514 — skip-check (skipped)

Skip reason:

deprecated or retired model

google-vertex/deepseek-ai/deepseek-ocr-maas — skip-check (skipped)

Skip reason:

Single-turn model; does not support multi-turn conversations

google-vertex/gemini-2.5-computer-use-preview-10-2025 — skip-check (skipped)

Skip reason:

Requires the Computer Use tool to be enabled

google-vertex/gemini-2.5-flash-image — skip-check (skipped)

Skip reason:

unsupported mode 'image'

google-vertex/gemini-3-pro-image-preview — skip-check (skipped)

Skip reason:

unsupported mode 'image'

google-vertex/gemini-live-2.5-flash-native-audio — skip-check (skipped)

Skip reason:

unsupported mode 'realtime'

google-vertex/google/content-moderation — skip-check (skipped)

Skip reason:

unsupported mode 'moderation'

google-vertex/google/face-detector — skip-check (skipped)

Skip reason:

unsupported mode 'unknown'

google-vertex/google/gemini-2.5-computer-use-preview-10-2025 — skip-check (skipped)

Skip reason:

Requires the Computer Use tool to be enabled

google-vertex/google/language-v1-analyze-entity-sentiment — skip-check (skipped)

Skip reason:

unsupported mode 'unknown'

google-vertex/google/language-v1-analyze-syntax — skip-check (skipped)

Skip reason:

unsupported mode 'unknown'

google-vertex/google/object-detector — skip-check (skipped)

Skip reason:

unsupported mode 'video'

google-vertex/google/people-blur — skip-check (skipped)

Skip reason:

unsupported mode 'video'

google-vertex/google/ppe-detector — skip-check (skipped)

Skip reason:

unsupported mode 'video'

google-vertex/google/pretrained-form-parser — skip-check (skipped)

Skip reason:

unsupported mode 'unknown'

google-vertex/google/tag-recognizer — skip-check (skipped)

Skip reason:

unsupported mode 'image'

google-vertex/google/text-detector — skip-check (skipped)

Skip reason:

unsupported mode 'unknown'

google-vertex/google/text-translation — skip-check (skipped)

Skip reason:

unsupported mode 'unknown'

google-vertex/google/video-text-detection — skip-check (skipped)

Skip reason:

unsupported mode 'video'

google-vertex/imagen-3.0-capability-001 — skip-check (skipped)

Skip reason:

unsupported mode 'image'

google-vertex/imagen-3.0-generate-001 — skip-check (skipped)

Skip reason:

unsupported mode 'image'

google-vertex/imagen-4.0-fast-generate-001 — skip-check (skipped)

Skip reason:

unsupported mode 'image'

google-vertex/mongodb/voyage-3.5-lite — skip-check (skipped)

Skip reason:

Provisioned model

google-vertex/mongodb/voyage-4 — skip-check (skipped)

Skip reason:

Provisioned model

cursor · 2026-05-05T13:53:46Z

+costs:
+    - input_cost_per_token: 5e-7
+      input_cost_per_token_batches: 2.5e-7
+      output_cost_per_token: 0.00001


TTS model uses wrong output cost key name

High Severity

All cost entries in google/gemini-2.5-flash-tts.yaml use output_cost_per_token instead of output_cost_per_audio_token. Every other TTS model in the repository — including the sibling gemini-2.5-flash-tts.yaml, google/gemini-2.5-pro-tts.yaml, and gemini-2.5-pro-tts.yaml — consistently uses output_cost_per_audio_token for audio output pricing. Using the wrong key means downstream consumers won't find the audio token cost under the expected field, likely resulting in incorrect or missing billing calculations.

^{Reviewed by Cursor Bugbot for commit 0061e3d. Configure here.}

github-actions · 2026-05-07T13:25:43Z

/test-models

harshiv-26 · 2026-05-07T13:27:43Z

to review: google/ onwards

cursor

Cursor Bugbot has reviewed your changes and found 1 potential issue.

There are 2 total unresolved issues (including 1 from previous review).

^{❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.}

^{Reviewed by Cursor Bugbot for commit 1fda97e. Configure here.}

cursor · 2026-05-07T13:29:26Z

-    - https://docs.cloud.google.com/vertex-ai/generative-ai/docs/learn/locations
-status: preview
+    - https://docs.cloud.google.com/text-to-speech/docs/gemini-tts
+status: active


Inconsistent status between two gemini-2.5-pro-tts model files

Medium Severity

This commit changes status from preview to active in gemini-2.5-pro-tts.yaml but leaves google/gemini-2.5-pro-tts.yaml at status: preview. Before this commit both files were preview, so this inconsistency is newly introduced. These represent the same underlying model, and consumers querying by either model path will get contradictory availability information.

Additional Locations (1)

providers/google-vertex/google/gemini-2.5-pro-tts.yaml#L90-L91

^{Reviewed by Cursor Bugbot for commit 1fda97e. Configure here.}

harshiv-26 · 2026-05-07T13:30:47Z

Gateway test results

Total: 249
Passed: 188
Failed: 32
Validation failed: 4
Errored: 0
Skipped: 25
Success rate: 83.93%

Provider	Model	Scenarios
`google-vertex`	`anthropic/claude-haiku-4-5@20251001`	success: tool-call, structured-output, tool-call:stream, params, structured-output:stream, reasoning, params:stream, reasoning:stream
`google-vertex`	`anthropic/claude-opus-4-5`	success: tool-call:stream, structured-output, structured-output:stream, params:stream, params, tool-call, reasoning:stream, reasoning
`google-vertex`	`anthropic/claude-opus-4-5@20251101`	success: parallel-tool-call:stream, tool-call, params:stream, params, tool-call:stream, structured-output:stream, parallel-tool-call, structured-output, reasoning:stream, reasoning
`google-vertex`	`anthropic/claude-opus-4-6`	success: structured-output, tool-call, tool-call:stream, params:stream, structured-output:stream, params, reasoning, reasoning:stream
`google-vertex`	`anthropic/claude-opus-4-6@default`	success: params, tool-call, parallel-tool-call, tool-call:stream, params:stream, structured-output:stream, structured-output, parallel-tool-call:stream, reasoning:stream, reasoning
`google-vertex`	`anthropic/claude-opus-4@20250514`	skipped: skip-check
`google-vertex`	`anthropic/claude-sonnet-4-6`	success: tool-call:stream, parallel-tool-call, tool-call, parallel-tool-call:stream, structured-output:stream, params:stream, params, structured-output, reasoning:stream, reasoning
`google-vertex`	`anthropic/claude-sonnet-4-6@default`	success: params:stream, tool-call:stream, structured-output:stream, structured-output, params, parallel-tool-call:stream, tool-call, parallel-tool-call, reasoning, reasoning:stream
`google-vertex`	`anthropic/claude-sonnet-4@20250514`	skipped: skip-check
`google-vertex`	`deepseek-ai/deepseek-ocr-maas`	skipped: skip-check
`google-vertex`	`gemini-2.5-computer-use-preview-10-2025`	skipped: skip-check
`google-vertex`	`gemini-2.5-flash-image`	skipped: skip-check
`google-vertex`	`gemini-2.5-flash-tts`	success: params
`google-vertex`	`gemini-2.5-pro-tts`	success: params
`google-vertex`	`gemini-3-pro-image-preview`	skipped: skip-check
`google-vertex`	`gemini-3.1-flash-image-preview`	success: params, params:stream, params:google-genai, reasoning:stream, reasoning, reasoning:google-genai, reasoning:stream:google-genai, params:stream:google-genai
`google-vertex`	`gemini-embedding-001`	success: params
`google-vertex`	`gemini-embedding-2-preview`	failure: params
`google-vertex`	`gemini-live-2.5-flash-native-audio`	skipped: skip-check
`google-vertex`	`google/content-moderation`	skipped: skip-check
`google-vertex`	`google/face-detector`	skipped: skip-check
`google-vertex`	`google/gemini-2.5-computer-use-preview-10-2025`	skipped: skip-check
`google-vertex`	`google/gemini-2.5-flash`	success: params, tool-call:stream, json-output:stream, params:stream, structured-output:stream, json-output, tool-call, structured-output, reasoning:stream, reasoning, tool-call:google-genai, params:stream:google-genai, json-output:google-genai, json-output:stream:google-genai, tool-call:stream:google-genai, params:google-genai, structured-output:google-genai, structured-output:stream:google-genai, reasoning:stream:google-genai, reasoning:google-genai
`google-vertex`	`google/gemini-2.5-flash-image`	success: params:stream, params, params:stream:google-genai, params:google-genai
`google-vertex`	`google/gemini-2.5-flash-lite-preview-09-2025`	success: json-output, tool-call:stream, json-output:stream, structured-output:stream, structured-output, params:stream, params, tool-call, reasoning:stream, reasoning, structured-output:stream:google-genai, tool-call:stream:google-genai, json-output:stream:google-genai, structured-output:google-genai, tool-call:google-genai, json-output:google-genai, params:google-genai, params:stream:google-genai, reasoning:stream:google-genai, reasoning:google-genai
`google-vertex`	`google/gemini-3-flash-preview`	success: params:stream, tool-call:stream, json-output, structured-output:stream, json-output:stream, tool-call, params:stream:google-genai, tool-call:stream:google-genai, json-output:stream:google-genai, structured-output:stream:google-genai, json-output:google-genai, params, structured-output:google-genai, params:google-genai, structured-output, reasoning, tool-call:google-genai validation_failure: reasoning:stream, reasoning:google-genai, reasoning:stream:google-genai
`google-vertex`	`google/gemini-3-pro-image-preview`	success: reasoning:stream, params:google-genai, reasoning, params:stream, params:stream:google-genai, params, reasoning:google-genai, reasoning:stream:google-genai
`google-vertex`	`google/gemini-3.1-flash-lite-preview`	success: tool-call:stream, json-output, structured-output, params, tool-call, json-output:stream, structured-output:stream, params:stream, reasoning:stream, reasoning, tool-call:google-genai, structured-output:google-genai, json-output:google-genai, json-output:stream:google-genai, structured-output:stream:google-genai, params:stream:google-genai, tool-call:stream:google-genai, params:google-genai, reasoning:stream:google-genai, reasoning:google-genai
`google-vertex`	`google/gemini-embedding-001`	success: params
`google-vertex`	`google/gemini-embedding-2-preview`	failure: params
`google-vertex`	`google/gemma4`	failure: tool-call, reasoning:stream, structured-output:stream, params:stream, params, structured-output, reasoning, tool-call:stream
`google-vertex`	`google/language-v1-analyze-entity-sentiment`	skipped: skip-check
`google-vertex`	`google/language-v1-analyze-syntax`	skipped: skip-check
`google-vertex`	`google/multimodalembedding`	success: params
`google-vertex`	`google/object-detector`	skipped: skip-check
`google-vertex`	`google/people-blur`	skipped: skip-check
`google-vertex`	`google/ppe-detector`	skipped: skip-check
`google-vertex`	`google/pretrained-form-parser`	skipped: skip-check
`google-vertex`	`google/tag-recognizer`	skipped: skip-check
`google-vertex`	`google/text-detector`	skipped: skip-check
`google-vertex`	`google/text-translation`	skipped: skip-check
`google-vertex`	`google/translate-llm`	failure: params, params:stream
`google-vertex`	`google/video-text-detection`	skipped: skip-check
`google-vertex`	`imagen-3.0-capability-001`	skipped: skip-check
`google-vertex`	`imagen-3.0-generate-001`	skipped: skip-check
`google-vertex`	`imagen-4.0-fast-generate-001`	skipped: skip-check
`google-vertex`	`minimaxai/minimax-m2`	failure: structured-output:stream, tool-call:stream, params, tool-call, params:stream, structured-output
`google-vertex`	`mistralai/codestral-2`	success: params:stream, params
`google-vertex`	`mongodb/voyage-3.5-lite`	skipped: skip-check
`google-vertex`	`mongodb/voyage-4`	skipped: skip-check
`google-vertex`	`moonshotai/kimi-k2-thinking-maas`	success: params:stream, tool-call, tool-call:stream, params, structured-output:stream, structured-output, reasoning:stream, reasoning
`google-vertex`	`openai/gpt-oss`	failure: structured-output, tool-call:stream, tool-call, structured-output:stream, reasoning, reasoning:stream, params, params:stream
`google-vertex`	`openai/gpt-oss-120b-maas`	success: tool-call:stream, structured-output, tool-call, params, params:stream validation_failure: structured-output:stream
`google-vertex`	`qwen/qwen3-coder-480b-a35b-instruct-maas`	success: tool-call:stream, tool-call, structured-output, structured-output:stream, params:stream, params
`google-vertex`	`text-multilingual-embedding-002`	success: params
`google-vertex`	`zai-org/glm-4.7`	failure: tool-call, tool-call:stream, params, structured-output:stream, structured-output, params:stream

Failures (36)

google-vertex/google/gemini-3-flash-preview — reasoning:stream (validation_failure)

Error:

Traceback (most recent call last):
  File "/tmp/tmp74f3_5u_/snippet.py", line 35, in <module>
    raise Exception("VALIDATION FAILED: reasoning stream - no reasoning information in stream")
Exception: VALIDATION FAILED: reasoning stream - no reasoning information in stream

Code snippet

from openai import OpenAI

client = OpenAI(api_key="***", base_url="https://internal.devtest.truefoundry.tech/api/llm")

response = client.chat.completions.create(
    model="test-v2-vertex/google-gemini-3-flash-preview",
    messages=[
        {"role": "system", "content": "You are a helpful assistant. You MUST think step by step and show your reasoning. Never skip reasoning steps."},
        {"role": "user", "content": "Hi"},
        {"role": "assistant", "content": "Hi, how can I help you"},
        {"role": "user", "content": "How to calculate 3^3^3^3? Think step by step and show all reasoning."},
    ],
    reasoning_effort="medium",
    stream=True,
)

_reasoning_detected = False
for chunk in response:
    if chunk.choices and len(chunk.choices) > 0:
        delta = chunk.choices[0].delta
        if delta.content is not None:
            print(delta.content, end="", flush=True)
        if getattr(delta, "reasoning_content", None) is not None:
            _reasoning_detected = True
        if getattr(delta, "reasoning", None) is not None:
            _reasoning_detected = True

    _usage = getattr(chunk, "usage", None)
    if _usage is not None:
        _details = getattr(_usage, "completion_tokens_details", None)
        if _details and getattr(_details, "reasoning_tokens", 0) > 0:
            _reasoning_detected = True

if not _reasoning_detected:
    raise Exception("VALIDATION FAILED: reasoning stream - no reasoning information in stream")
print("\nVALIDATION: reasoning stream SUCCESS")

google-vertex/google/gemini-3-flash-preview — reasoning:google-genai (validation_failure)

Error:

Traceback (most recent call last):
  File "/tmp/tmp5qu5v_s8/snippet.py", line 67, in <module>
    raise Exception("VALIDATION FAILED: reasoning - no thinking information in GenAI response")
Exception: VALIDATION FAILED: reasoning - no thinking information in GenAI response

Code snippet

from google import genai
from google.genai import types

_endpoint = "https://internal.devtest.truefoundry.tech/api/llm"
_api_key = "***"
_full_model = "test-v2-vertex/google/gemini-3-flash-preview"

_parts = _full_model.split("/")
_provider_account = _parts[0]
_model_id = "/".join(_parts[1:])
if "/" in _model_id:
    _model_id = _model_id.rsplit("/", 1)[-1]

_base_url = f"{_endpoint}/gemini/{_provider_account}/proxy"

client = genai.Client(
    api_key=_api_key,
    http_options=types.HttpOptions(base_url=_base_url),
)

contents = [
    types.Content(role="user", parts=[types.Part.from_text(text="Hi")]),
    types.Content(role="model", parts=[types.Part.from_text(text="Hi, how can I help you")]),
    types.Content(role="user", parts=[types.Part.from_text(text="How to calculate 3^3^3^3? Think step by step and show all reasoning.")]),
]

config = types.GenerateContentConfig(
    system_instruction="You are a helpful assistant. You MUST think step by step and show your reasoning. Never skip reasoning steps.",
    thinking_config=types.ThinkingConfig(
        include_thoughts=True,
        thinking_budget=5000,
    ),
)

response = client.models.generate_content(
    model=_model_id,
    contents=contents,
    config=config,
)

for part in response.candidates[0].content.parts:
    if not part.text:
        continue
    if part.thought:
        print(f"[Thinking] {part.text}")
    else:
        print(part.text)

_parts = response.candidates[0].content.parts
_thought_detected = False

for _part in _parts:
    if not _part.text:
        continue
    if _part.thought:
        _thought_detected = True
        print(f"Thinking: {_part.text[:200]}...")
    else:
        print(_part.text)

_usage = getattr(response, "usage_metadata", None)
if _usage and getattr(_usage, "thoughts_token_count", 0):
    _thought_detected = True

if not _thought_detected:
    print("Response: ", response)
    raise Exception("VALIDATION FAILED: reasoning - no thinking information in GenAI response")
print("VALIDATION: reasoning SUCCESS")

google-vertex/google/gemini-3-flash-preview — reasoning:stream:google-genai (validation_failure)

Error:

Traceback (most recent call last):
  File "/tmp/tmp6e6lnigd/snippet.py", line 70, in <module>
    raise Exception("VALIDATION FAILED: reasoning stream - no thinking information in GenAI stream")
Exception: VALIDATION FAILED: reasoning stream - no thinking information in GenAI stream

Code snippet

from google import genai
from google.genai import types

_endpoint = "https://internal.devtest.truefoundry.tech/api/llm"
_api_key = "***"
_full_model = "test-v2-vertex/google/gemini-3-flash-preview"

_parts = _full_model.split("/")
_provider_account = _parts[0]
_model_id = "/".join(_parts[1:])
if "/" in _model_id:
    _model_id = _model_id.rsplit("/", 1)[-1]

_base_url = f"{_endpoint}/gemini/{_provider_account}/proxy"

client = genai.Client(
    api_key=_api_key,
    http_options=types.HttpOptions(base_url=_base_url),
)

contents = [
    types.Content(role="user", parts=[types.Part.from_text(text="Hi")]),
    types.Content(role="model", parts=[types.Part.from_text(text="Hi, how can I help you")]),
    types.Content(role="user", parts=[types.Part.from_text(text="How to calculate 3^3^3^3? Think step by step and show all reasoning.")]),
]

config = types.GenerateContentConfig(
    system_instruction="You are a helpful assistant. You MUST think step by step and show your reasoning. Never skip reasoning steps.",
    thinking_config=types.ThinkingConfig(
        include_thoughts=True,
        thinking_budget=5000,
    ),
)

_chunks = []
for chunk in client.models.generate_content_stream(
    model=_model_id,
    contents=contents,
    config=config,
):
    _chunks.append(chunk)
    if chunk.candidates and chunk.candidates[0].content and chunk.candidates[0].content.parts:
        for part in chunk.candidates[0].content.parts:
            if not part.text:
                continue
            if part.thought:
                print(f"[Thinking] {part.text}", end="", flush=True)
            else:
                print(part.text, end="", flush=True)

_thought_detected = False
for _chunk in _chunks:
    if not _chunk.candidates or not _chunk.candidates[0].content:
        continue
    for _part in _chunk.candidates[0].content.parts:
        if not _part.text:
            continue
        if _part.thought:
            _thought_detected = True
            print(_part.text, end="", flush=True)
        else:
            print(_part.text, end="", flush=True)

if not _thought_detected:
    _usage = getattr(_chunks[-1], "usage_metadata", None) if _chunks else None
    if _usage and getattr(_usage, "thoughts_token_count", 0):
        _thought_detected = True

if not _thought_detected:
    raise Exception("VALIDATION FAILED: reasoning stream - no thinking information in GenAI stream")
print("\nVALIDATION: reasoning stream SUCCESS")

google-vertex/openai/gpt-oss-120b-maas — structured-output:stream (validation_failure)

Error:

Traceback (most recent call last):
  File "/tmp/tmptctfqb_5/snippet.py", line 49, in <module>
    raise Exception("VALIDATION FAILED: structured-output stream - missing expected fields (name, date, participants)")
Exception: VALIDATION FAILED: structured-output stream - missing expected fields (name, date, participants)

Code snippet

from openai import OpenAI
import json

client = OpenAI(api_key="***", base_url="https://internal.devtest.truefoundry.tech/api/llm")

response_schema = json.loads('''{
  "title": "CalendarEvent",
  "type": "object",
  "properties": {
    "name": { "type": "string" },
    "date": { "type": "string" },
    "participants": {
      "type": "array",
      "items": { "type": "string" }
    }
  },
  "required": ["name", "date", "participants"],
  "additionalProperties": false
}''')

response = client.chat.completions.create(
    model="test-v2-vertex/openai-gpt-oss-120b-maas",
    messages=[
        {"role": "system", "content": "Extract the event information as JSON."},
        {"role": "user", "content": "Hi"},
        {"role": "assistant", "content": "Hi, how can I help you"},
        {"role": "user", "content": "Alice and Bob are going to a science fair on Friday. Extract the event details as JSON."},
    ],
    response_format={"type": "json_schema", "json_schema": {"name": "CalendarEvent", "schema": response_schema}},
    stream=True,
)

import json as _json

_accumulated = ""
for chunk in response:
    if chunk.choices and len(chunk.choices) > 0:
        delta = chunk.choices[0].delta
        if delta.content is not None:
            _accumulated += delta.content
            print(delta.content, end="", flush=True)

if not _accumulated:
    raise Exception("VALIDATION FAILED: structured-output stream - no content received")

_parsed = _json.loads(_accumulated)

if "name" not in _parsed or "date" not in _parsed or "participants" not in _parsed:
    raise Exception("VALIDATION FAILED: structured-output stream - missing expected fields (name, date, participants)")

if not isinstance(_parsed.get("participants"), list):
    raise Exception("VALIDATION FAILED: structured-output stream - 'participants' is not a list, schema not enforced")

if set(_parsed.keys()) != {"name", "date", "participants"}:
    raise Exception(
        f"VALIDATION FAILED: structured-output stream - unexpected keys present: {set(_parsed.keys())}"
    )

print("\nVALIDATION: structured-output stream SUCCESS")

google-vertex/google/translate-llm — params (failure)

Error:

Traceback (most recent call last):
  File "/tmp/tmpyfslg5u0/snippet.py", line 5, in <module>
    response = client.chat.completions.create(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_utils/_utils.py", line 286, in wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/resources/chat/completions/completions.py", line 1147, in create
    return self._post(
           ^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 1259, in post
    return cast(ResponseT, self.request(cast_to, opts, stream=stream, stream_cls=stream_cls))
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 1047, in request
    raise self._make_status_error_from_response(err.response) from None
openai.NotFoundError: Error code: 404 - {'status': 'failure', 'message': 'vertex error: Publisher Model `projects/truefoundry-devtest/locations/global/publishers/google/models/translate-llm` was not found or your project does not have access to it. Please ensure you are using a valid model version. For more information, see: https://cloud.google.com/vertex-ai/generative-ai/docs/learn/model-versions', 'error': {'message': 'vertex error: Publisher Model `projects/truefoundry-devtest/locations/global/publishers/google/models/translate-llm` was not found or your project does not have access to it. Please ensure you are using a valid model version. For more information, see: https://cloud.google.com/vertex-ai/generative-ai/docs/learn/model-versions', 'type': 'APIError', 'code': '404'}, 'error_origin_level': 'api_error', 'provider': 'google-vertex'}

Code snippet

from openai import OpenAI

client = OpenAI(api_key="***", base_url="https://internal.devtest.truefoundry.tech/api/llm")

response = client.chat.completions.create(
    model="test-v2-vertex/google-translate-llm",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Hi"},
        {"role": "assistant", "content": "Hi, how can I help you"},
        {"role": "user", "content": "What is the capital of France?"},
    ],
    max_tokens=256,
    temperature=0.7,
    stream=False,
)

print(response.choices[0].message.content)

google-vertex/google/translate-llm — params:stream (failure)

Error:

Traceback (most recent call last):
  File "/tmp/tmpbjbmuyv6/snippet.py", line 5, in <module>
    response = client.chat.completions.create(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_utils/_utils.py", line 286, in wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/resources/chat/completions/completions.py", line 1147, in create
    return self._post(
           ^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 1259, in post
    return cast(ResponseT, self.request(cast_to, opts, stream=stream, stream_cls=stream_cls))
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 1047, in request
    raise self._make_status_error_from_response(err.response) from None
openai.NotFoundError: Error code: 404 - {'status': 'failure', 'message': 'vertex error: Publisher Model `projects/truefoundry-devtest/locations/global/publishers/google/models/translate-llm` was not found or your project does not have access to it. Please ensure you are using a valid model version. For more information, see: https://cloud.google.com/vertex-ai/generative-ai/docs/learn/model-versions', 'error': {'message': 'vertex error: Publisher Model `projects/truefoundry-devtest/locations/global/publishers/google/models/translate-llm` was not found or your project does not have access to it. Please ensure you are using a valid model version. For more information, see: https://cloud.google.com/vertex-ai/generative-ai/docs/learn/model-versions', 'type': 'APIError', 'code': '404'}, 'error_origin_level': 'api_error', 'provider': 'google-vertex'}

Code snippet

from openai import OpenAI

client = OpenAI(api_key="***", base_url="https://internal.devtest.truefoundry.tech/api/llm")

response = client.chat.completions.create(
    model="test-v2-vertex/google-translate-llm",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Hi"},
        {"role": "assistant", "content": "Hi, how can I help you"},
        {"role": "user", "content": "What is the capital of France?"},
    ],
    max_tokens=256,
    temperature=0.7,
    stream=True,
)

for chunk in response:
    if chunk.choices and len(chunk.choices) > 0:
        delta = chunk.choices[0].delta
        if delta.content is not None:
            print(delta.content, end="", flush=True)

google-vertex/minimaxai/minimax-m2 — structured-output:stream (failure)

Error:

Traceback (most recent call last):
  File "/tmp/tmp7j_xlwng/snippet.py", line 21, in <module>
    response = client.chat.completions.create(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_utils/_utils.py", line 286, in wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/resources/chat/completions/completions.py", line 1147, in create
    return self._post(
           ^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 1259, in post
    return cast(ResponseT, self.request(cast_to, opts, stream=stream, stream_cls=stream_cls))
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 1047, in request
    raise self._make_status_error_from_response(err.response) from None
openai.NotFoundError: Error code: 404 - {'status': 'failure', 'message': 'vertex error: Publisher Model `projects/248190060486/locations/global/publishers/minimaxai/models/minimax-m2` was not found or your project does not have access to it. Please ensure you are using a valid model version. For more information, see: https://cloud.google.com/vertex-ai/generative-ai/docs/learn/model-versions', 'error': {'message': 'vertex error: Publisher Model `projects/248190060486/locations/global/publishers/minimaxai/models/minimax-m2` was not found or your project does not have access to it. Please ensure you are using a valid model version. For more information, see: https://cloud.google.com/vertex-ai/generative-ai/docs/learn/model-versions', 'type': 'APIError', 'code': '404'}, 'error_origin_level': 'api_error', 'provider': 'google-vertex'}

Code snippet

from openai import OpenAI
import json

client = OpenAI(api_key="***", base_url="https://internal.devtest.truefoundry.tech/api/llm")

response_schema = json.loads('''{
  "title": "CalendarEvent",
  "type": "object",
  "properties": {
    "name": { "type": "string" },
    "date": { "type": "string" },
    "participants": {
      "type": "array",
      "items": { "type": "string" }
    }
  },
  "required": ["name", "date", "participants"],
  "additionalProperties": false
}''')

response = client.chat.completions.create(
    model="test-v2-vertex/minimaxai-minimax-m2",
    messages=[
        {"role": "system", "content": "Extract the event information as JSON."},
        {"role": "user", "content": "Hi"},
        {"role": "assistant", "content": "Hi, how can I help you"},
        {"role": "user", "content": "Alice and Bob are going to a science fair on Friday. Extract the event details as JSON."},
    ],
    response_format={"type": "json_schema", "json_schema": {"name": "CalendarEvent", "schema": response_schema}},
    stream=True,
)

import json as _json

_accumulated = ""
for chunk in response:
    if chunk.choices and len(chunk.choices) > 0:
        delta = chunk.choices[0].delta
        if delta.content is not None:
            _accumulated += delta.content
            print(delta.content, end="", flush=True)

if not _accumulated:
    raise Exception("VALIDATION FAILED: structured-output stream - no content received")

_parsed = _json.loads(_accumulated)

if "name" not in _parsed or "date" not in _parsed or "participants" not in _parsed:
    raise Exception("VALIDATION FAILED: structured-output stream - missing expected fields (name, date, participants)")

if not isinstance(_parsed.get("participants"), list):
    raise Exception("VALIDATION FAILED: structured-output stream - 'participants' is not a list, schema not enforced")

if set(_parsed.keys()) != {"name", "date", "participants"}:
    raise Exception(
        f"VALIDATION FAILED: structured-output stream - unexpected keys present: {set(_parsed.keys())}"
    )

print("\nVALIDATION: structured-output stream SUCCESS")

google-vertex/minimaxai/minimax-m2 — tool-call:stream (failure)

Error:

Traceback (most recent call last):
  File "/tmp/tmpwyhgjass/snippet.py", line 27, in <module>
    response = client.chat.completions.create(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_utils/_utils.py", line 286, in wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/resources/chat/completions/completions.py", line 1147, in create
    return self._post(
           ^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 1259, in post
    return cast(ResponseT, self.request(cast_to, opts, stream=stream, stream_cls=stream_cls))
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 1047, in request
    raise self._make_status_error_from_response(err.response) from None
openai.NotFoundError: Error code: 404 - {'status': 'failure', 'message': 'vertex error: Publisher Model `projects/248190060486/locations/global/publishers/minimaxai/models/minimax-m2` was not found or your project does not have access to it. Please ensure you are using a valid model version. For more information, see: https://cloud.google.com/vertex-ai/generative-ai/docs/learn/model-versions', 'error': {'message': 'vertex error: Publisher Model `projects/248190060486/locations/global/publishers/minimaxai/models/minimax-m2` was not found or your project does not have access to it. Please ensure you are using a valid model version. For more information, see: https://cloud.google.com/vertex-ai/generative-ai/docs/learn/model-versions', 'type': 'APIError', 'code': '404'}, 'error_origin_level': 'api_error', 'provider': 'google-vertex'}

Code snippet

from openai import OpenAI

client = OpenAI(api_key="***", base_url="https://internal.devtest.truefoundry.tech/api/llm")

tools = [
    {
        "type": "function",
        "function": {
            "name": "get_weather",
            "description": "Get the current weather for a location.",
            "parameters": {
                "type": "object",
                "properties": {
                    "location": {
                        "type": "string",
                        "description": "The city name, e.g. London",
                    },
                },
                "required": ["location"],
                "additionalProperties": False,
            },
            "strict": True,
        },
    },
]

response = client.chat.completions.create(
    model="test-v2-vertex/minimaxai-minimax-m2",
    messages=[
        {"role": "system", "content": "You are a helpful assistant with access to tools. You MUST strictly use the provided tools to answer. Never respond with plain text when a tool is available."},
        {"role": "user", "content": "Hi"},
        {"role": "assistant", "content": "Hi, how can I help you"},
        {"role": "user", "content": "Use the get_weather tool to check the weather in London. You must call the tool, do not respond with plain text."},
    ],
    tools=tools,
    tool_choice="auto",
    stream=True,
)

_tool_calls_made = False
for chunk in response:
    if chunk.choices and len(chunk.choices) > 0:
        delta = chunk.choices[0].delta
        if delta.content is not None:
            print(delta.content, end="", flush=True)
        if delta.tool_calls:
            _tool_calls_made = True
            for _tc in delta.tool_calls:
                if _tc.function:
                    print(_tc.function.arguments or "", end="", flush=True)

if not _tool_calls_made:
    raise Exception("VALIDATION FAILED: tool-call stream - no tool calls received")
print("\nVALIDATION: tool-call stream SUCCESS")

google-vertex/minimaxai/minimax-m2 — params (failure)

Error:

Traceback (most recent call last):
  File "/tmp/tmpttt5vemb/snippet.py", line 5, in <module>
    response = client.chat.completions.create(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_utils/_utils.py", line 286, in wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/resources/chat/completions/completions.py", line 1147, in create
    return self._post(
           ^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 1259, in post
    return cast(ResponseT, self.request(cast_to, opts, stream=stream, stream_cls=stream_cls))
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 1047, in request
    raise self._make_status_error_from_response(err.response) from None
openai.NotFoundError: Error code: 404 - {'status': 'failure', 'message': 'vertex error: Publisher Model `projects/248190060486/locations/global/publishers/minimaxai/models/minimax-m2` was not found or your project does not have access to it. Please ensure you are using a valid model version. For more information, see: https://cloud.google.com/vertex-ai/generative-ai/docs/learn/model-versions', 'error': {'message': 'vertex error: Publisher Model `projects/248190060486/locations/global/publishers/minimaxai/models/minimax-m2` was not found or your project does not have access to it. Please ensure you are using a valid model version. For more information, see: https://cloud.google.com/vertex-ai/generative-ai/docs/learn/model-versions', 'type': 'APIError', 'code': '404'}, 'error_origin_level': 'api_error', 'provider': 'google-vertex'}

Code snippet

from openai import OpenAI

client = OpenAI(api_key="***", base_url="https://internal.devtest.truefoundry.tech/api/llm")

response = client.chat.completions.create(
    model="test-v2-vertex/minimaxai-minimax-m2",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Hi"},
        {"role": "assistant", "content": "Hi, how can I help you"},
        {"role": "user", "content": "What is the capital of France?"},
    ],
    max_tokens=256,
    temperature=0.7,
    stream=False,
)

print(response.choices[0].message.content)

google-vertex/minimaxai/minimax-m2 — tool-call (failure)

Error:

Traceback (most recent call last):
  File "/tmp/tmptzhbgw6m/snippet.py", line 27, in <module>
    response = client.chat.completions.create(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_utils/_utils.py", line 286, in wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/resources/chat/completions/completions.py", line 1147, in create
    return self._post(
           ^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 1259, in post
    return cast(ResponseT, self.request(cast_to, opts, stream=stream, stream_cls=stream_cls))
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 1047, in request
    raise self._make_status_error_from_response(err.response) from None
openai.NotFoundError: Error code: 404 - {'status': 'failure', 'message': 'vertex error: Publisher Model `projects/248190060486/locations/global/publishers/minimaxai/models/minimax-m2` was not found or your project does not have access to it. Please ensure you are using a valid model version. For more information, see: https://cloud.google.com/vertex-ai/generative-ai/docs/learn/model-versions', 'error': {'message': 'vertex error: Publisher Model `projects/248190060486/locations/global/publishers/minimaxai/models/minimax-m2` was not found or your project does not have access to it. Please ensure you are using a valid model version. For more information, see: https://cloud.google.com/vertex-ai/generative-ai/docs/learn/model-versions', 'type': 'APIError', 'code': '404'}, 'error_origin_level': 'api_error', 'provider': 'google-vertex'}

Code snippet

from openai import OpenAI

client = OpenAI(api_key="***", base_url="https://internal.devtest.truefoundry.tech/api/llm")

tools = [
    {
        "type": "function",
        "function": {
            "name": "get_weather",
            "description": "Get the current weather for a location.",
            "parameters": {
                "type": "object",
                "properties": {
                    "location": {
                        "type": "string",
                        "description": "The city name, e.g. London",
                    },
                },
                "required": ["location"],
                "additionalProperties": False,
            },
            "strict": True,
        },
    },
]

response = client.chat.completions.create(
    model="test-v2-vertex/minimaxai-minimax-m2",
    messages=[
        {"role": "system", "content": "You are a helpful assistant with access to tools. You MUST strictly use the provided tools to answer. Never respond with plain text when a tool is available."},
        {"role": "user", "content": "Hi"},
        {"role": "assistant", "content": "Hi, how can I help you"},
        {"role": "user", "content": "Use the get_weather tool to check the weather in London. You must call the tool, do not respond with plain text."},
    ],
    tools=tools,
    tool_choice="auto",
    stream=False,
)

_message = response.choices[0].message
if _message.tool_calls:
    for _tc in _message.tool_calls:
        print(f"Function: {_tc.function.name}")
        print(f"Arguments: {_tc.function.arguments}")
else:
    print(_message.content)

if not _message.tool_calls or len(_message.tool_calls) == 0:
    raise Exception("VALIDATION FAILED: tool-call - no tool calls in response")
print("VALIDATION: tool-call SUCCESS")

google-vertex/minimaxai/minimax-m2 — params:stream (failure)

Error:

Traceback (most recent call last):
  File "/tmp/tmp06pg5tlr/snippet.py", line 5, in <module>
    response = client.chat.completions.create(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_utils/_utils.py", line 286, in wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/resources/chat/completions/completions.py", line 1147, in create
    return self._post(
           ^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 1259, in post
    return cast(ResponseT, self.request(cast_to, opts, stream=stream, stream_cls=stream_cls))
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 1047, in request
    raise self._make_status_error_from_response(err.response) from None
openai.NotFoundError: Error code: 404 - {'status': 'failure', 'message': 'vertex error: Publisher Model `projects/248190060486/locations/global/publishers/minimaxai/models/minimax-m2` was not found or your project does not have access to it. Please ensure you are using a valid model version. For more information, see: https://cloud.google.com/vertex-ai/generative-ai/docs/learn/model-versions', 'error': {'message': 'vertex error: Publisher Model `projects/248190060486/locations/global/publishers/minimaxai/models/minimax-m2` was not found or your project does not have access to it. Please ensure you are using a valid model version. For more information, see: https://cloud.google.com/vertex-ai/generative-ai/docs/learn/model-versions', 'type': 'APIError', 'code': '404'}, 'error_origin_level': 'api_error', 'provider': 'google-vertex'}

Code snippet

from openai import OpenAI

client = OpenAI(api_key="***", base_url="https://internal.devtest.truefoundry.tech/api/llm")

response = client.chat.completions.create(
    model="test-v2-vertex/minimaxai-minimax-m2",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Hi"},
        {"role": "assistant", "content": "Hi, how can I help you"},
        {"role": "user", "content": "What is the capital of France?"},
    ],
    max_tokens=256,
    temperature=0.7,
    stream=True,
)

for chunk in response:
    if chunk.choices and len(chunk.choices) > 0:
        delta = chunk.choices[0].delta
        if delta.content is not None:
            print(delta.content, end="", flush=True)

google-vertex/minimaxai/minimax-m2 — structured-output (failure)

Error:

Traceback (most recent call last):
  File "/tmp/tmpe12z88u7/snippet.py", line 21, in <module>
    response = client.chat.completions.create(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_utils/_utils.py", line 286, in wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/resources/chat/completions/completions.py", line 1147, in create
    return self._post(
           ^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 1259, in post
    return cast(ResponseT, self.request(cast_to, opts, stream=stream, stream_cls=stream_cls))
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 1047, in request
    raise self._make_status_error_from_response(err.response) from None
openai.NotFoundError: Error code: 404 - {'status': 'failure', 'message': 'vertex error: Publisher Model `projects/248190060486/locations/global/publishers/minimaxai/models/minimax-m2` was not found or your project does not have access to it. Please ensure you are using a valid model version. For more information, see: https://cloud.google.com/vertex-ai/generative-ai/docs/learn/model-versions', 'error': {'message': 'vertex error: Publisher Model `projects/248190060486/locations/global/publishers/minimaxai/models/minimax-m2` was not found or your project does not have access to it. Please ensure you are using a valid model version. For more information, see: https://cloud.google.com/vertex-ai/generative-ai/docs/learn/model-versions', 'type': 'APIError', 'code': '404'}, 'error_origin_level': 'api_error', 'provider': 'google-vertex'}

Code snippet

from openai import OpenAI
import json

client = OpenAI(api_key="***", base_url="https://internal.devtest.truefoundry.tech/api/llm")

response_schema = json.loads('''{
  "title": "CalendarEvent",
  "type": "object",
  "properties": {
    "name": { "type": "string" },
    "date": { "type": "string" },
    "participants": {
      "type": "array",
      "items": { "type": "string" }
    }
  },
  "required": ["name", "date", "participants"],
  "additionalProperties": false
}''')

response = client.chat.completions.create(
    model="test-v2-vertex/minimaxai-minimax-m2",
    messages=[
        {"role": "system", "content": "Extract the event information as JSON."},
        {"role": "user", "content": "Hi"},
        {"role": "assistant", "content": "Hi, how can I help you"},
        {"role": "user", "content": "Alice and Bob are going to a science fair on Friday. Extract the event details as JSON."},
    ],
    response_format={"type": "json_schema", "json_schema": {"name": "CalendarEvent", "schema": response_schema}},
    stream=False,
)

import json as _json

_content = response.choices[0].message.content
print(_content)

if not _content:
    raise Exception("VALIDATION FAILED: structured-output - response content is empty")

_parsed = _json.loads(_content)

if "name" not in _parsed or "date" not in _parsed or "participants" not in _parsed:
    raise Exception("VALIDATION FAILED: structured-output - missing expected fields (name, date, participants)")

if not isinstance(_parsed.get("participants"), list):
    raise Exception("VALIDATION FAILED: structured-output - 'participants' is not a list, schema not enforced")

if set(_parsed.keys()) != {"name", "date", "participants"}:
    raise Exception(
        f"VALIDATION FAILED: structured-output - unexpected keys present: {set(_parsed.keys())}"
    )

print("VALIDATION: structured-output SUCCESS")

google-vertex/google/gemma4 — tool-call (failure)

Error:

Traceback (most recent call last):
  File "/tmp/tmp_06_accr/snippet.py", line 27, in <module>
    response = client.chat.completions.create(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_utils/_utils.py", line 286, in wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/resources/chat/completions/completions.py", line 1147, in create
    return self._post(
           ^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 1259, in post
    return cast(ResponseT, self.request(cast_to, opts, stream=stream, stream_cls=stream_cls))
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 1047, in request
    raise self._make_status_error_from_response(err.response) from None
openai.NotFoundError: Error code: 404 - {'status': 'failure', 'message': 'vertex error: Publisher Model `projects/truefoundry-devtest/locations/global/publishers/google/models/gemma4` was not found or your project does not have access to it. Please ensure you are using a valid model version. For more information, see: https://cloud.google.com/vertex-ai/generative-ai/docs/learn/model-versions', 'error': {'message': 'vertex error: Publisher Model `projects/truefoundry-devtest/locations/global/publishers/google/models/gemma4` was not found or your project does not have access to it. Please ensure you are using a valid model version. For more information, see: https://cloud.google.com/vertex-ai/generative-ai/docs/learn/model-versions', 'type': 'APIError', 'code': '404'}, 'error_origin_level': 'api_error', 'provider': 'google-vertex'}

Code snippet

from openai import OpenAI

client = OpenAI(api_key="***", base_url="https://internal.devtest.truefoundry.tech/api/llm")

tools = [
    {
        "type": "function",
        "function": {
            "name": "get_weather",
            "description": "Get the current weather for a location.",
            "parameters": {
                "type": "object",
                "properties": {
                    "location": {
                        "type": "string",
                        "description": "The city name, e.g. London",
                    },
                },
                "required": ["location"],
                "additionalProperties": False,
            },
            "strict": True,
        },
    },
]

response = client.chat.completions.create(
    model="test-v2-vertex/google-gemma4",
    messages=[
        {"role": "system", "content": "You are a helpful assistant with access to tools. You MUST strictly use the provided tools to answer. Never respond with plain text when a tool is available."},
        {"role": "user", "content": "Hi"},
        {"role": "assistant", "content": "Hi, how can I help you"},
        {"role": "user", "content": "Use the get_weather tool to check the weather in London. You must call the tool, do not respond with plain text."},
    ],
    tools=tools,
    tool_choice="auto",
    stream=False,
)

_message = response.choices[0].message
if _message.tool_calls:
    for _tc in _message.tool_calls:
        print(f"Function: {_tc.function.name}")
        print(f"Arguments: {_tc.function.arguments}")
else:
    print(_message.content)

if not _message.tool_calls or len(_message.tool_calls) == 0:
    raise Exception("VALIDATION FAILED: tool-call - no tool calls in response")
print("VALIDATION: tool-call SUCCESS")

google-vertex/google/gemma4 — reasoning:stream (failure)

Error:

Traceback (most recent call last):
  File "/tmp/tmpdgj2wgj6/snippet.py", line 5, in <module>
    response = client.chat.completions.create(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_utils/_utils.py", line 286, in wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/resources/chat/completions/completions.py", line 1147, in create
    return self._post(
           ^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 1259, in post
    return cast(ResponseT, self.request(cast_to, opts, stream=stream, stream_cls=stream_cls))
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 1047, in request
    raise self._make_status_error_from_response(err.response) from None
openai.NotFoundError: Error code: 404 - {'status': 'failure', 'message': 'vertex error: Publisher Model `projects/truefoundry-devtest/locations/global/publishers/google/models/gemma4` was not found or your project does not have access to it. Please ensure you are using a valid model version. For more information, see: https://cloud.google.com/vertex-ai/generative-ai/docs/learn/model-versions', 'error': {'message': 'vertex error: Publisher Model `projects/truefoundry-devtest/locations/global/publishers/google/models/gemma4` was not found or your project does not have access to it. Please ensure you are using a valid model version. For more information, see: https://cloud.google.com/vertex-ai/generative-ai/docs/learn/model-versions', 'type': 'APIError', 'code': '404'}, 'error_origin_level': 'api_error', 'provider': 'google-vertex'}

Code snippet

from openai import OpenAI

client = OpenAI(api_key="***", base_url="https://internal.devtest.truefoundry.tech/api/llm")

response = client.chat.completions.create(
    model="test-v2-vertex/google-gemma4",
    messages=[
        {"role": "system", "content": "You are a helpful assistant. You MUST think step by step and show your reasoning. Never skip reasoning steps."},
        {"role": "user", "content": "Hi"},
        {"role": "assistant", "content": "Hi, how can I help you"},
        {"role": "user", "content": "How to calculate 3^3^3^3? Think step by step and show all reasoning."},
    ],
    reasoning_effort="medium",
    stream=True,
)

_reasoning_detected = False
for chunk in response:
    if chunk.choices and len(chunk.choices) > 0:
        delta = chunk.choices[0].delta
        if delta.content is not None:
            print(delta.content, end="", flush=True)
        if getattr(delta, "reasoning_content", None) is not None:
            _reasoning_detected = True
        if getattr(delta, "reasoning", None) is not None:
            _reasoning_detected = True

    _usage = getattr(chunk, "usage", None)
    if _usage is not None:
        _details = getattr(_usage, "completion_tokens_details", None)
        if _details and getattr(_details, "reasoning_tokens", 0) > 0:
            _reasoning_detected = True

if not _reasoning_detected:
    raise Exception("VALIDATION FAILED: reasoning stream - no reasoning information in stream")
print("\nVALIDATION: reasoning stream SUCCESS")

google-vertex/google/gemma4 — structured-output:stream (failure)

Error:

Traceback (most recent call last):
  File "/tmp/tmp8b9a8z6t/snippet.py", line 21, in <module>
    response = client.chat.completions.create(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_utils/_utils.py", line 286, in wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/resources/chat/completions/completions.py", line 1147, in create
    return self._post(
           ^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 1259, in post
    return cast(ResponseT, self.request(cast_to, opts, stream=stream, stream_cls=stream_cls))
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 1047, in request
    raise self._make_status_error_from_response(err.response) from None
openai.NotFoundError: Error code: 404 - {'status': 'failure', 'message': 'vertex error: Publisher Model `projects/truefoundry-devtest/locations/global/publishers/google/models/gemma4` was not found or your project does not have access to it. Please ensure you are using a valid model version. For more information, see: https://cloud.google.com/vertex-ai/generative-ai/docs/learn/model-versions', 'error': {'message': 'vertex error: Publisher Model `projects/truefoundry-devtest/locations/global/publishers/google/models/gemma4` was not found or your project does not have access to it. Please ensure you are using a valid model version. For more information, see: https://cloud.google.com/vertex-ai/generative-ai/docs/learn/model-versions', 'type': 'APIError', 'code': '404'}, 'error_origin_level': 'api_error', 'provider': 'google-vertex'}

Code snippet

from openai import OpenAI
import json

client = OpenAI(api_key="***", base_url="https://internal.devtest.truefoundry.tech/api/llm")

response_schema = json.loads('''{
  "title": "CalendarEvent",
  "type": "object",
  "properties": {
    "name": { "type": "string" },
    "date": { "type": "string" },
    "participants": {
      "type": "array",
      "items": { "type": "string" }
    }
  },
  "required": ["name", "date", "participants"],
  "additionalProperties": false
}''')

response = client.chat.completions.create(
    model="test-v2-vertex/google-gemma4",
    messages=[
        {"role": "system", "content": "Extract the event information as JSON."},
        {"role": "user", "content": "Hi"},
        {"role": "assistant", "content": "Hi, how can I help you"},
        {"role": "user", "content": "Alice and Bob are going to a science fair on Friday. Extract the event details as JSON."},
    ],
    response_format={"type": "json_schema", "json_schema": {"name": "CalendarEvent", "schema": response_schema}},
    stream=True,
)

import json as _json

_accumulated = ""
for chunk in response:
    if chunk.choices and len(chunk.choices) > 0:
        delta = chunk.choices[0].delta
        if delta.content is not None:
            _accumulated += delta.content
            print(delta.content, end="", flush=True)

if not _accumulated:
    raise Exception("VALIDATION FAILED: structured-output stream - no content received")

_parsed = _json.loads(_accumulated)

if "name" not in _parsed or "date" not in _parsed or "participants" not in _parsed:
    raise Exception("VALIDATION FAILED: structured-output stream - missing expected fields (name, date, participants)")

if not isinstance(_parsed.get("participants"), list):
    raise Exception("VALIDATION FAILED: structured-output stream - 'participants' is not a list, schema not enforced")

if set(_parsed.keys()) != {"name", "date", "participants"}:
    raise Exception(
        f"VALIDATION FAILED: structured-output stream - unexpected keys present: {set(_parsed.keys())}"
    )

print("\nVALIDATION: structured-output stream SUCCESS")

google-vertex/google/gemma4 — params:stream (failure)

Error:

Traceback (most recent call last):
  File "/tmp/tmpci1aqb_3/snippet.py", line 5, in <module>
    response = client.chat.completions.create(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_utils/_utils.py", line 286, in wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/resources/chat/completions/completions.py", line 1147, in create
    return self._post(
           ^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 1259, in post
    return cast(ResponseT, self.request(cast_to, opts, stream=stream, stream_cls=stream_cls))
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 1047, in request
    raise self._make_status_error_from_response(err.response) from None
openai.NotFoundError: Error code: 404 - {'status': 'failure', 'message': 'vertex error: Publisher Model `projects/truefoundry-devtest/locations/global/publishers/google/models/gemma4` was not found or your project does not have access to it. Please ensure you are using a valid model version. For more information, see: https://cloud.google.com/vertex-ai/generative-ai/docs/learn/model-versions', 'error': {'message': 'vertex error: Publisher Model `projects/truefoundry-devtest/locations/global/publishers/google/models/gemma4` was not found or your project does not have access to it. Please ensure you are using a valid model version. For more information, see: https://cloud.google.com/vertex-ai/generative-ai/docs/learn/model-versions', 'type': 'APIError', 'code': '404'}, 'error_origin_level': 'api_error', 'provider': 'google-vertex'}

Code snippet

from openai import OpenAI

client = OpenAI(api_key="***", base_url="https://internal.devtest.truefoundry.tech/api/llm")

response = client.chat.completions.create(
    model="test-v2-vertex/google-gemma4",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Hi"},
        {"role": "assistant", "content": "Hi, how can I help you"},
        {"role": "user", "content": "What is the capital of France?"},
    ],
    max_tokens=256,
    temperature=0.7,
    stream=True,
)

for chunk in response:
    if chunk.choices and len(chunk.choices) > 0:
        delta = chunk.choices[0].delta
        if delta.content is not None:
            print(delta.content, end="", flush=True)

google-vertex/google/gemma4 — params (failure)

Error:

Traceback (most recent call last):
  File "/tmp/tmp7623f9ft/snippet.py", line 5, in <module>
    response = client.chat.completions.create(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_utils/_utils.py", line 286, in wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/resources/chat/completions/completions.py", line 1147, in create
    return self._post(
           ^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 1259, in post
    return cast(ResponseT, self.request(cast_to, opts, stream=stream, stream_cls=stream_cls))
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 1047, in request
    raise self._make_status_error_from_response(err.response) from None
openai.NotFoundError: Error code: 404 - {'status': 'failure', 'message': 'vertex error: Publisher Model `projects/truefoundry-devtest/locations/global/publishers/google/models/gemma4` was not found or your project does not have access to it. Please ensure you are using a valid model version. For more information, see: https://cloud.google.com/vertex-ai/generative-ai/docs/learn/model-versions', 'error': {'message': 'vertex error: Publisher Model `projects/truefoundry-devtest/locations/global/publishers/google/models/gemma4` was not found or your project does not have access to it. Please ensure you are using a valid model version. For more information, see: https://cloud.google.com/vertex-ai/generative-ai/docs/learn/model-versions', 'type': 'APIError', 'code': '404'}, 'error_origin_level': 'api_error', 'provider': 'google-vertex'}

Code snippet

from openai import OpenAI

client = OpenAI(api_key="***", base_url="https://internal.devtest.truefoundry.tech/api/llm")

response = client.chat.completions.create(
    model="test-v2-vertex/google-gemma4",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Hi"},
        {"role": "assistant", "content": "Hi, how can I help you"},
        {"role": "user", "content": "What is the capital of France?"},
    ],
    max_tokens=256,
    temperature=0.7,
    stream=False,
)

print(response.choices[0].message.content)

google-vertex/google/gemma4 — structured-output (failure)

Error:

Traceback (most recent call last):
  File "/tmp/tmpfqw9b1jt/snippet.py", line 21, in <module>
    response = client.chat.completions.create(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_utils/_utils.py", line 286, in wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/resources/chat/completions/completions.py", line 1147, in create
    return self._post(
           ^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 1259, in post
    return cast(ResponseT, self.request(cast_to, opts, stream=stream, stream_cls=stream_cls))
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 1047, in request
    raise self._make_status_error_from_response(err.response) from None
openai.NotFoundError: Error code: 404 - {'status': 'failure', 'message': 'vertex error: Publisher Model `projects/truefoundry-devtest/locations/global/publishers/google/models/gemma4` was not found or your project does not have access to it. Please ensure you are using a valid model version. For more information, see: https://cloud.google.com/vertex-ai/generative-ai/docs/learn/model-versions', 'error': {'message': 'vertex error: Publisher Model `projects/truefoundry-devtest/locations/global/publishers/google/models/gemma4` was not found or your project does not have access to it. Please ensure you are using a valid model version. For more information, see: https://cloud.google.com/vertex-ai/generative-ai/docs/learn/model-versions', 'type': 'APIError', 'code': '404'}, 'error_origin_level': 'api_error', 'provider': 'google-vertex'}

Code snippet

from openai import OpenAI
import json

client = OpenAI(api_key="***", base_url="https://internal.devtest.truefoundry.tech/api/llm")

response_schema = json.loads('''{
  "title": "CalendarEvent",
  "type": "object",
  "properties": {
    "name": { "type": "string" },
    "date": { "type": "string" },
    "participants": {
      "type": "array",
      "items": { "type": "string" }
    }
  },
  "required": ["name", "date", "participants"],
  "additionalProperties": false
}''')

response = client.chat.completions.create(
    model="test-v2-vertex/google-gemma4",
    messages=[
        {"role": "system", "content": "Extract the event information as JSON."},
        {"role": "user", "content": "Hi"},
        {"role": "assistant", "content": "Hi, how can I help you"},
        {"role": "user", "content": "Alice and Bob are going to a science fair on Friday. Extract the event details as JSON."},
    ],
    response_format={"type": "json_schema", "json_schema": {"name": "CalendarEvent", "schema": response_schema}},
    stream=False,
)

import json as _json

_content = response.choices[0].message.content
print(_content)

if not _content:
    raise Exception("VALIDATION FAILED: structured-output - response content is empty")

_parsed = _json.loads(_content)

if "name" not in _parsed or "date" not in _parsed or "participants" not in _parsed:
    raise Exception("VALIDATION FAILED: structured-output - missing expected fields (name, date, participants)")

if not isinstance(_parsed.get("participants"), list):
    raise Exception("VALIDATION FAILED: structured-output - 'participants' is not a list, schema not enforced")

if set(_parsed.keys()) != {"name", "date", "participants"}:
    raise Exception(
        f"VALIDATION FAILED: structured-output - unexpected keys present: {set(_parsed.keys())}"
    )

print("VALIDATION: structured-output SUCCESS")

google-vertex/google/gemma4 — reasoning (failure)

Error:

Traceback (most recent call last):
  File "/tmp/tmptfeomh70/snippet.py", line 5, in <module>
    response = client.chat.completions.create(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_utils/_utils.py", line 286, in wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/resources/chat/completions/completions.py", line 1147, in create
    return self._post(
           ^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 1259, in post
    return cast(ResponseT, self.request(cast_to, opts, stream=stream, stream_cls=stream_cls))
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 1047, in request
    raise self._make_status_error_from_response(err.response) from None
openai.NotFoundError: Error code: 404 - {'status': 'failure', 'message': 'vertex error: Publisher Model `projects/truefoundry-devtest/locations/global/publishers/google/models/gemma4` was not found or your project does not have access to it. Please ensure you are using a valid model version. For more information, see: https://cloud.google.com/vertex-ai/generative-ai/docs/learn/model-versions', 'error': {'message': 'vertex error: Publisher Model `projects/truefoundry-devtest/locations/global/publishers/google/models/gemma4` was not found or your project does not have access to it. Please ensure you are using a valid model version. For more information, see: https://cloud.google.com/vertex-ai/generative-ai/docs/learn/model-versions', 'type': 'APIError', 'code': '404'}, 'error_origin_level': 'api_error', 'provider': 'google-vertex'}

Code snippet

from openai import OpenAI

client = OpenAI(api_key="***", base_url="https://internal.devtest.truefoundry.tech/api/llm")

response = client.chat.completions.create(
    model="test-v2-vertex/google-gemma4",
    messages=[
        {"role": "system", "content": "You are a helpful assistant. You MUST think step by step and show your reasoning. Never skip reasoning steps."},
        {"role": "user", "content": "Hi"},
        {"role": "assistant", "content": "Hi, how can I help you"},
        {"role": "user", "content": "How to calculate 3^3^3^3? Think step by step and show all reasoning."},
    ],
    reasoning_effort="medium",
    stream=False,
)

_usage = getattr(response, "usage", None)
_reasoning_detected = False

_choices = getattr(response, "choices", None)
if _choices and len(_choices) > 0:
    _message = getattr(_choices[0], "message", None)
else:
    _message = None

if _message and getattr(_message, "content", None) is not None:
    print(_message.content)

if _usage is not None:
    _output_token_details = getattr(_usage, "completion_tokens_details", None)
    if _output_token_details and getattr(_output_token_details, "reasoning_tokens", 0) > 0:
        _reasoning_detected = True
    elif getattr(_usage, "reasoning", None) is not None:
        _reasoning_detected = True

if getattr(_message, "reasoning_content", None) is not None:
    _reasoning_detected = True
elif getattr(_message, "reasoning", None) is not None:
    _reasoning_detected = True

if not _reasoning_detected:
    print("Response: ", response)
    raise Exception("VALIDATION FAILED: reasoning - no reasoning information in response")
print("VALIDATION: reasoning SUCCESS")

google-vertex/google/gemma4 — tool-call:stream (failure)

Error:

Traceback (most recent call last):
  File "/tmp/tmpbjur9_e1/snippet.py", line 27, in <module>
    response = client.chat.completions.create(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_utils/_utils.py", line 286, in wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/resources/chat/completions/completions.py", line 1147, in create
    return self._post(
           ^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 1259, in post
    return cast(ResponseT, self.request(cast_to, opts, stream=stream, stream_cls=stream_cls))
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 1047, in request
    raise self._make_status_error_from_response(err.response) from None
openai.NotFoundError: Error code: 404 - {'status': 'failure', 'message': 'vertex error: Publisher Model `projects/truefoundry-devtest/locations/global/publishers/google/models/gemma4` was not found or your project does not have access to it. Please ensure you are using a valid model version. For more information, see: https://cloud.google.com/vertex-ai/generative-ai/docs/learn/model-versions', 'error': {'message': 'vertex error: Publisher Model `projects/truefoundry-devtest/locations/global/publishers/google/models/gemma4` was not found or your project does not have access to it. Please ensure you are using a valid model version. For more information, see: https://cloud.google.com/vertex-ai/generative-ai/docs/learn/model-versions', 'type': 'APIError', 'code': '404'}, 'error_origin_level': 'api_error', 'provider': 'google-vertex'}

Code snippet

from openai import OpenAI

client = OpenAI(api_key="***", base_url="https://internal.devtest.truefoundry.tech/api/llm")

tools = [
    {
        "type": "function",
        "function": {
            "name": "get_weather",
            "description": "Get the current weather for a location.",
            "parameters": {
                "type": "object",
                "properties": {
                    "location": {
                        "type": "string",
                        "description": "The city name, e.g. London",
                    },
                },
                "required": ["location"],
                "additionalProperties": False,
            },
            "strict": True,
        },
    },
]

response = client.chat.completions.create(
    model="test-v2-vertex/google-gemma4",
    messages=[
        {"role": "system", "content": "You are a helpful assistant with access to tools. You MUST strictly use the provided tools to answer. Never respond with plain text when a tool is available."},
        {"role": "user", "content": "Hi"},
        {"role": "assistant", "content": "Hi, how can I help you"},
        {"role": "user", "content": "Use the get_weather tool to check the weather in London. You must call the tool, do not respond with plain text."},
    ],
    tools=tools,
    tool_choice="auto",
    stream=True,
)

_tool_calls_made = False
for chunk in response:
    if chunk.choices and len(chunk.choices) > 0:
        delta = chunk.choices[0].delta
        if delta.content is not None:
            print(delta.content, end="", flush=True)
        if delta.tool_calls:
            _tool_calls_made = True
            for _tc in delta.tool_calls:
                if _tc.function:
                    print(_tc.function.arguments or "", end="", flush=True)

if not _tool_calls_made:
    raise Exception("VALIDATION FAILED: tool-call stream - no tool calls received")
print("\nVALIDATION: tool-call stream SUCCESS")

google-vertex/openai/gpt-oss — structured-output (failure)

Error:

Traceback (most recent call last):
  File "/tmp/tmpqjmm8anq/snippet.py", line 21, in <module>
    response = client.chat.completions.create(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_utils/_utils.py", line 286, in wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/resources/chat/completions/completions.py", line 1147, in create
    return self._post(
           ^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 1259, in post
    return cast(ResponseT, self.request(cast_to, opts, stream=stream, stream_cls=stream_cls))
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 1047, in request
    raise self._make_status_error_from_response(err.response) from None
openai.NotFoundError: Error code: 404 - {'status': 'failure', 'message': 'vertex error: Publisher Model `projects/248190060486/locations/global/publishers/openai/models/gpt-oss` was not found or your project does not have access to it. Please ensure you are using a valid model version. For more information, see: https://cloud.google.com/vertex-ai/generative-ai/docs/learn/model-versions', 'error': {'message': 'vertex error: Publisher Model `projects/248190060486/locations/global/publishers/openai/models/gpt-oss` was not found or your project does not have access to it. Please ensure you are using a valid model version. For more information, see: https://cloud.google.com/vertex-ai/generative-ai/docs/learn/model-versions', 'type': 'APIError', 'code': '404'}, 'error_origin_level': 'api_error', 'provider': 'google-vertex'}

Code snippet

from openai import OpenAI
import json

client = OpenAI(api_key="***", base_url="https://internal.devtest.truefoundry.tech/api/llm")

response_schema = json.loads('''{
  "title": "CalendarEvent",
  "type": "object",
  "properties": {
    "name": { "type": "string" },
    "date": { "type": "string" },
    "participants": {
      "type": "array",
      "items": { "type": "string" }
    }
  },
  "required": ["name", "date", "participants"],
  "additionalProperties": false
}''')

response = client.chat.completions.create(
    model="test-v2-vertex/openai-gpt-oss",
    messages=[
        {"role": "system", "content": "Extract the event information as JSON."},
        {"role": "user", "content": "Hi"},
        {"role": "assistant", "content": "Hi, how can I help you"},
        {"role": "user", "content": "Alice and Bob are going to a science fair on Friday. Extract the event details as JSON."},
    ],
    response_format={"type": "json_schema", "json_schema": {"name": "CalendarEvent", "schema": response_schema}},
    stream=False,
)

import json as _json

_content = response.choices[0].message.content
print(_content)

if not _content:
    raise Exception("VALIDATION FAILED: structured-output - response content is empty")

_parsed = _json.loads(_content)

if "name" not in _parsed or "date" not in _parsed or "participants" not in _parsed:
    raise Exception("VALIDATION FAILED: structured-output - missing expected fields (name, date, participants)")

if not isinstance(_parsed.get("participants"), list):
    raise Exception("VALIDATION FAILED: structured-output - 'participants' is not a list, schema not enforced")

if set(_parsed.keys()) != {"name", "date", "participants"}:
    raise Exception(
        f"VALIDATION FAILED: structured-output - unexpected keys present: {set(_parsed.keys())}"
    )

print("VALIDATION: structured-output SUCCESS")

google-vertex/openai/gpt-oss — tool-call:stream (failure)

Error:

Traceback (most recent call last):
  File "/tmp/tmp4exo4rtl/snippet.py", line 27, in <module>
    response = client.chat.completions.create(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_utils/_utils.py", line 286, in wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/resources/chat/completions/completions.py", line 1147, in create
    return self._post(
           ^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 1259, in post
    return cast(ResponseT, self.request(cast_to, opts, stream=stream, stream_cls=stream_cls))
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 1047, in request
    raise self._make_status_error_from_response(err.response) from None
openai.NotFoundError: Error code: 404 - {'status': 'failure', 'message': 'vertex error: Publisher Model `projects/248190060486/locations/global/publishers/openai/models/gpt-oss` was not found or your project does not have access to it. Please ensure you are using a valid model version. For more information, see: https://cloud.google.com/vertex-ai/generative-ai/docs/learn/model-versions', 'error': {'message': 'vertex error: Publisher Model `projects/248190060486/locations/global/publishers/openai/models/gpt-oss` was not found or your project does not have access to it. Please ensure you are using a valid model version. For more information, see: https://cloud.google.com/vertex-ai/generative-ai/docs/learn/model-versions', 'type': 'APIError', 'code': '404'}, 'error_origin_level': 'api_error', 'provider': 'google-vertex'}

Code snippet

from openai import OpenAI

client = OpenAI(api_key="***", base_url="https://internal.devtest.truefoundry.tech/api/llm")

tools = [
    {
        "type": "function",
        "function": {
            "name": "get_weather",
            "description": "Get the current weather for a location.",
            "parameters": {
                "type": "object",
                "properties": {
                    "location": {
                        "type": "string",
                        "description": "The city name, e.g. London",
                    },
                },
                "required": ["location"],
                "additionalProperties": False,
            },
            "strict": True,
        },
    },
]

response = client.chat.completions.create(
    model="test-v2-vertex/openai-gpt-oss",
    messages=[
        {"role": "system", "content": "You are a helpful assistant with access to tools. You MUST strictly use the provided tools to answer. Never respond with plain text when a tool is available."},
        {"role": "user", "content": "Hi"},
        {"role": "assistant", "content": "Hi, how can I help you"},
        {"role": "user", "content": "Use the get_weather tool to check the weather in London. You must call the tool, do not respond with plain text."},
    ],
    tools=tools,
    tool_choice="auto",
    stream=True,
)

_tool_calls_made = False
for chunk in response:
    if chunk.choices and len(chunk.choices) > 0:
        delta = chunk.choices[0].delta
        if delta.content is not None:
            print(delta.content, end="", flush=True)
        if delta.tool_calls:
            _tool_calls_made = True
            for _tc in delta.tool_calls:
                if _tc.function:
                    print(_tc.function.arguments or "", end="", flush=True)

if not _tool_calls_made:
    raise Exception("VALIDATION FAILED: tool-call stream - no tool calls received")
print("\nVALIDATION: tool-call stream SUCCESS")

google-vertex/openai/gpt-oss — tool-call (failure)

Error:

Traceback (most recent call last):
  File "/tmp/tmp2ce1vpnp/snippet.py", line 27, in <module>
    response = client.chat.completions.create(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_utils/_utils.py", line 286, in wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/resources/chat/completions/completions.py", line 1147, in create
    return self._post(
           ^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 1259, in post
    return cast(ResponseT, self.request(cast_to, opts, stream=stream, stream_cls=stream_cls))
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 1047, in request
    raise self._make_status_error_from_response(err.response) from None
openai.NotFoundError: Error code: 404 - {'status': 'failure', 'message': 'vertex error: Publisher Model `projects/248190060486/locations/global/publishers/openai/models/gpt-oss` was not found or your project does not have access to it. Please ensure you are using a valid model version. For more information, see: https://cloud.google.com/vertex-ai/generative-ai/docs/learn/model-versions', 'error': {'message': 'vertex error: Publisher Model `projects/248190060486/locations/global/publishers/openai/models/gpt-oss` was not found or your project does not have access to it. Please ensure you are using a valid model version. For more information, see: https://cloud.google.com/vertex-ai/generative-ai/docs/learn/model-versions', 'type': 'APIError', 'code': '404'}, 'error_origin_level': 'api_error', 'provider': 'google-vertex'}

Code snippet

from openai import OpenAI

client = OpenAI(api_key="***", base_url="https://internal.devtest.truefoundry.tech/api/llm")

tools = [
    {
        "type": "function",
        "function": {
            "name": "get_weather",
            "description": "Get the current weather for a location.",
            "parameters": {
                "type": "object",
                "properties": {
                    "location": {
                        "type": "string",
                        "description": "The city name, e.g. London",
                    },
                },
                "required": ["location"],
                "additionalProperties": False,
            },
            "strict": True,
        },
    },
]

response = client.chat.completions.create(
    model="test-v2-vertex/openai-gpt-oss",
    messages=[
        {"role": "system", "content": "You are a helpful assistant with access to tools. You MUST strictly use the provided tools to answer. Never respond with plain text when a tool is available."},
        {"role": "user", "content": "Hi"},
        {"role": "assistant", "content": "Hi, how can I help you"},
        {"role": "user", "content": "Use the get_weather tool to check the weather in London. You must call the tool, do not respond with plain text."},
    ],
    tools=tools,
    tool_choice="auto",
    stream=False,
)

_message = response.choices[0].message
if _message.tool_calls:
    for _tc in _message.tool_calls:
        print(f"Function: {_tc.function.name}")
        print(f"Arguments: {_tc.function.arguments}")
else:
    print(_message.content)

if not _message.tool_calls or len(_message.tool_calls) == 0:
    raise Exception("VALIDATION FAILED: tool-call - no tool calls in response")
print("VALIDATION: tool-call SUCCESS")

google-vertex/openai/gpt-oss — structured-output:stream (failure)

Error:

Traceback (most recent call last):
  File "/tmp/tmptg8wqw4d/snippet.py", line 21, in <module>
    response = client.chat.completions.create(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_utils/_utils.py", line 286, in wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/resources/chat/completions/completions.py", line 1147, in create
    return self._post(
           ^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 1259, in post
    return cast(ResponseT, self.request(cast_to, opts, stream=stream, stream_cls=stream_cls))
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 1047, in request
    raise self._make_status_error_from_response(err.response) from None
openai.NotFoundError: Error code: 404 - {'status': 'failure', 'message': 'vertex error: Publisher Model `projects/248190060486/locations/global/publishers/openai/models/gpt-oss` was not found or your project does not have access to it. Please ensure you are using a valid model version. For more information, see: https://cloud.google.com/vertex-ai/generative-ai/docs/learn/model-versions', 'error': {'message': 'vertex error: Publisher Model `projects/248190060486/locations/global/publishers/openai/models/gpt-oss` was not found or your project does not have access to it. Please ensure you are using a valid model version. For more information, see: https://cloud.google.com/vertex-ai/generative-ai/docs/learn/model-versions', 'type': 'APIError', 'code': '404'}, 'error_origin_level': 'api_error', 'provider': 'google-vertex'}

Code snippet

from openai import OpenAI
import json

client = OpenAI(api_key="***", base_url="https://internal.devtest.truefoundry.tech/api/llm")

response_schema = json.loads('''{
  "title": "CalendarEvent",
  "type": "object",
  "properties": {
    "name": { "type": "string" },
    "date": { "type": "string" },
    "participants": {
      "type": "array",
      "items": { "type": "string" }
    }
  },
  "required": ["name", "date", "participants"],
  "additionalProperties": false
}''')

response = client.chat.completions.create(
    model="test-v2-vertex/openai-gpt-oss",
    messages=[
        {"role": "system", "content": "Extract the event information as JSON."},
        {"role": "user", "content": "Hi"},
        {"role": "assistant", "content": "Hi, how can I help you"},
        {"role": "user", "content": "Alice and Bob are going to a science fair on Friday. Extract the event details as JSON."},
    ],
    response_format={"type": "json_schema", "json_schema": {"name": "CalendarEvent", "schema": response_schema}},
    stream=True,
)

import json as _json

_accumulated = ""
for chunk in response:
    if chunk.choices and len(chunk.choices) > 0:
        delta = chunk.choices[0].delta
        if delta.content is not None:
            _accumulated += delta.content
            print(delta.content, end="", flush=True)

if not _accumulated:
    raise Exception("VALIDATION FAILED: structured-output stream - no content received")

_parsed = _json.loads(_accumulated)

if "name" not in _parsed or "date" not in _parsed or "participants" not in _parsed:
    raise Exception("VALIDATION FAILED: structured-output stream - missing expected fields (name, date, participants)")

if not isinstance(_parsed.get("participants"), list):
    raise Exception("VALIDATION FAILED: structured-output stream - 'participants' is not a list, schema not enforced")

if set(_parsed.keys()) != {"name", "date", "participants"}:
    raise Exception(
        f"VALIDATION FAILED: structured-output stream - unexpected keys present: {set(_parsed.keys())}"
    )

print("\nVALIDATION: structured-output stream SUCCESS")

google-vertex/openai/gpt-oss — reasoning (failure)

Error:

Traceback (most recent call last):
  File "/tmp/tmp5xe1qoqq/snippet.py", line 5, in <module>
    response = client.chat.completions.create(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_utils/_utils.py", line 286, in wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/resources/chat/completions/completions.py", line 1147, in create
    return self._post(
           ^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 1259, in post
    return cast(ResponseT, self.request(cast_to, opts, stream=stream, stream_cls=stream_cls))
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 1047, in request
    raise self._make_status_error_from_response(err.response) from None
openai.NotFoundError: Error code: 404 - {'status': 'failure', 'message': 'vertex error: Publisher Model `projects/248190060486/locations/global/publishers/openai/models/gpt-oss` was not found or your project does not have access to it. Please ensure you are using a valid model version. For more information, see: https://cloud.google.com/vertex-ai/generative-ai/docs/learn/model-versions', 'error': {'message': 'vertex error: Publisher Model `projects/248190060486/locations/global/publishers/openai/models/gpt-oss` was not found or your project does not have access to it. Please ensure you are using a valid model version. For more information, see: https://cloud.google.com/vertex-ai/generative-ai/docs/learn/model-versions', 'type': 'APIError', 'code': '404'}, 'error_origin_level': 'api_error', 'provider': 'google-vertex'}

Code snippet

from openai import OpenAI

client = OpenAI(api_key="***", base_url="https://internal.devtest.truefoundry.tech/api/llm")

response = client.chat.completions.create(
    model="test-v2-vertex/openai-gpt-oss",
    messages=[
        {"role": "system", "content": "You are a helpful assistant. You MUST think step by step and show your reasoning. Never skip reasoning steps."},
        {"role": "user", "content": "Hi"},
        {"role": "assistant", "content": "Hi, how can I help you"},
        {"role": "user", "content": "How to calculate 3^3^3^3? Think step by step and show all reasoning."},
    ],
    reasoning_effort="medium",
    stream=False,
)

_usage = getattr(response, "usage", None)
_reasoning_detected = False

_choices = getattr(response, "choices", None)
if _choices and len(_choices) > 0:
    _message = getattr(_choices[0], "message", None)
else:
    _message = None

if _message and getattr(_message, "content", None) is not None:
    print(_message.content)

if _usage is not None:
    _output_token_details = getattr(_usage, "completion_tokens_details", None)
    if _output_token_details and getattr(_output_token_details, "reasoning_tokens", 0) > 0:
        _reasoning_detected = True
    elif getattr(_usage, "reasoning", None) is not None:
        _reasoning_detected = True

if getattr(_message, "reasoning_content", None) is not None:
    _reasoning_detected = True
elif getattr(_message, "reasoning", None) is not None:
    _reasoning_detected = True

if not _reasoning_detected:
    print("Response: ", response)
    raise Exception("VALIDATION FAILED: reasoning - no reasoning information in response")
print("VALIDATION: reasoning SUCCESS")

google-vertex/openai/gpt-oss — reasoning:stream (failure)

Error:

Traceback (most recent call last):
  File "/tmp/tmpe9gxrl_4/snippet.py", line 5, in <module>
    response = client.chat.completions.create(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_utils/_utils.py", line 286, in wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/resources/chat/completions/completions.py", line 1147, in create
    return self._post(
           ^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 1259, in post
    return cast(ResponseT, self.request(cast_to, opts, stream=stream, stream_cls=stream_cls))
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 1047, in request
    raise self._make_status_error_from_response(err.response) from None
openai.NotFoundError: Error code: 404 - {'status': 'failure', 'message': 'vertex error: Publisher Model `projects/248190060486/locations/global/publishers/openai/models/gpt-oss` was not found or your project does not have access to it. Please ensure you are using a valid model version. For more information, see: https://cloud.google.com/vertex-ai/generative-ai/docs/learn/model-versions', 'error': {'message': 'vertex error: Publisher Model `projects/248190060486/locations/global/publishers/openai/models/gpt-oss` was not found or your project does not have access to it. Please ensure you are using a valid model version. For more information, see: https://cloud.google.com/vertex-ai/generative-ai/docs/learn/model-versions', 'type': 'APIError', 'code': '404'}, 'error_origin_level': 'api_error', 'provider': 'google-vertex'}

Code snippet

from openai import OpenAI

client = OpenAI(api_key="***", base_url="https://internal.devtest.truefoundry.tech/api/llm")

response = client.chat.completions.create(
    model="test-v2-vertex/openai-gpt-oss",
    messages=[
        {"role": "system", "content": "You are a helpful assistant. You MUST think step by step and show your reasoning. Never skip reasoning steps."},
        {"role": "user", "content": "Hi"},
        {"role": "assistant", "content": "Hi, how can I help you"},
        {"role": "user", "content": "How to calculate 3^3^3^3? Think step by step and show all reasoning."},
    ],
    reasoning_effort="medium",
    stream=True,
)

_reasoning_detected = False
for chunk in response:
    if chunk.choices and len(chunk.choices) > 0:
        delta = chunk.choices[0].delta
        if delta.content is not None:
            print(delta.content, end="", flush=True)
        if getattr(delta, "reasoning_content", None) is not None:
            _reasoning_detected = True
        if getattr(delta, "reasoning", None) is not None:
            _reasoning_detected = True

    _usage = getattr(chunk, "usage", None)
    if _usage is not None:
        _details = getattr(_usage, "completion_tokens_details", None)
        if _details and getattr(_details, "reasoning_tokens", 0) > 0:
            _reasoning_detected = True

if not _reasoning_detected:
    raise Exception("VALIDATION FAILED: reasoning stream - no reasoning information in stream")
print("\nVALIDATION: reasoning stream SUCCESS")

google-vertex/openai/gpt-oss — params (failure)

Error:

Traceback (most recent call last):
  File "/tmp/tmp4ul067nf/snippet.py", line 5, in <module>
    response = client.chat.completions.create(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_utils/_utils.py", line 286, in wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/resources/chat/completions/completions.py", line 1147, in create
    return self._post(
           ^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 1259, in post
    return cast(ResponseT, self.request(cast_to, opts, stream=stream, stream_cls=stream_cls))
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 1047, in request
    raise self._make_status_error_from_response(err.response) from None
openai.NotFoundError: Error code: 404 - {'status': 'failure', 'message': 'vertex error: Publisher Model `projects/248190060486/locations/global/publishers/openai/models/gpt-oss` was not found or your project does not have access to it. Please ensure you are using a valid model version. For more information, see: https://cloud.google.com/vertex-ai/generative-ai/docs/learn/model-versions', 'error': {'message': 'vertex error: Publisher Model `projects/248190060486/locations/global/publishers/openai/models/gpt-oss` was not found or your project does not have access to it. Please ensure you are using a valid model version. For more information, see: https://cloud.google.com/vertex-ai/generative-ai/docs/learn/model-versions', 'type': 'APIError', 'code': '404'}, 'error_origin_level': 'api_error', 'provider': 'google-vertex'}

Code snippet

from openai import OpenAI

client = OpenAI(api_key="***", base_url="https://internal.devtest.truefoundry.tech/api/llm")

response = client.chat.completions.create(
    model="test-v2-vertex/openai-gpt-oss",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Hi"},
        {"role": "assistant", "content": "Hi, how can I help you"},
        {"role": "user", "content": "What is the capital of France?"},
    ],
    max_tokens=256,
    temperature=0.7,
    stream=False,
)

print(response.choices[0].message.content)

google-vertex/openai/gpt-oss — params:stream (failure)

Error:

Traceback (most recent call last):
  File "/tmp/tmp7puakpa8/snippet.py", line 5, in <module>
    response = client.chat.completions.create(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_utils/_utils.py", line 286, in wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/resources/chat/completions/completions.py", line 1147, in create
    return self._post(
           ^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 1259, in post
    return cast(ResponseT, self.request(cast_to, opts, stream=stream, stream_cls=stream_cls))
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 1047, in request
    raise self._make_status_error_from_response(err.response) from None
openai.NotFoundError: Error code: 404 - {'status': 'failure', 'message': 'vertex error: Publisher Model `projects/248190060486/locations/global/publishers/openai/models/gpt-oss` was not found or your project does not have access to it. Please ensure you are using a valid model version. For more information, see: https://cloud.google.com/vertex-ai/generative-ai/docs/learn/model-versions', 'error': {'message': 'vertex error: Publisher Model `projects/248190060486/locations/global/publishers/openai/models/gpt-oss` was not found or your project does not have access to it. Please ensure you are using a valid model version. For more information, see: https://cloud.google.com/vertex-ai/generative-ai/docs/learn/model-versions', 'type': 'APIError', 'code': '404'}, 'error_origin_level': 'api_error', 'provider': 'google-vertex'}

Code snippet

from openai import OpenAI

client = OpenAI(api_key="***", base_url="https://internal.devtest.truefoundry.tech/api/llm")

response = client.chat.completions.create(
    model="test-v2-vertex/openai-gpt-oss",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Hi"},
        {"role": "assistant", "content": "Hi, how can I help you"},
        {"role": "user", "content": "What is the capital of France?"},
    ],
    max_tokens=256,
    temperature=0.7,
    stream=True,
)

for chunk in response:
    if chunk.choices and len(chunk.choices) > 0:
        delta = chunk.choices[0].delta
        if delta.content is not None:
            print(delta.content, end="", flush=True)

google-vertex/gemini-embedding-2-preview — params (failure)

Error:

Traceback (most recent call last):
  File "/tmp/tmp3w9m0s80/snippet.py", line 5, in <module>
    response = client.embeddings.create(
               ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/resources/embeddings.py", line 132, in create
    return self._post(
           ^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 1259, in post
    return cast(ResponseT, self.request(cast_to, opts, stream=stream, stream_cls=stream_cls))
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 1047, in request
    raise self._make_status_error_from_response(err.response) from None
openai.NotFoundError: Error code: 404 - {'status': 'failure', 'message': 'vertex error: Publisher Model `projects/truefoundry-devtest/locations/global/publishers/google/models/gemini-embedding-2-preview` was not found or your project does not have access to it. Please ensure you are using a valid model version. For more information, see: https://cloud.google.com/vertex-ai/generative-ai/docs/learn/model-versions', 'error': {'message': 'vertex error: Publisher Model `projects/truefoundry-devtest/locations/global/publishers/google/models/gemini-embedding-2-preview` was not found or your project does not have access to it. Please ensure you are using a valid model version. For more information, see: https://cloud.google.com/vertex-ai/generative-ai/docs/learn/model-versions', 'type': 'APIError', 'code': '404'}, 'error_origin_level': 'api_error', 'provider': 'google-vertex'}

Code snippet

from openai import OpenAI

client = OpenAI(api_key="***", base_url="https://internal.devtest.truefoundry.tech/api/llm")

response = client.embeddings.create(
    model="test-v2-vertex/gemini-embedding-2-preview",
    input="What is the capital of France?",
    encoding_format="float",
)

output = [embed.embedding for embed in response.data]
print(output)

google-vertex/google/gemini-embedding-2-preview — params (failure)

Error:

Traceback (most recent call last):
  File "/tmp/tmp7oiq99rr/snippet.py", line 5, in <module>
    response = client.embeddings.create(
               ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/resources/embeddings.py", line 132, in create
    return self._post(
           ^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 1259, in post
    return cast(ResponseT, self.request(cast_to, opts, stream=stream, stream_cls=stream_cls))
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 1047, in request
    raise self._make_status_error_from_response(err.response) from None
openai.NotFoundError: Error code: 404 - {'status': 'failure', 'message': 'vertex error: Publisher Model `projects/truefoundry-devtest/locations/global/publishers/google/models/gemini-embedding-2-preview` was not found or your project does not have access to it. Please ensure you are using a valid model version. For more information, see: https://cloud.google.com/vertex-ai/generative-ai/docs/learn/model-versions', 'error': {'message': 'vertex error: Publisher Model `projects/truefoundry-devtest/locations/global/publishers/google/models/gemini-embedding-2-preview` was not found or your project does not have access to it. Please ensure you are using a valid model version. For more information, see: https://cloud.google.com/vertex-ai/generative-ai/docs/learn/model-versions', 'type': 'APIError', 'code': '404'}, 'error_origin_level': 'api_error', 'provider': 'google-vertex'}

Code snippet

from openai import OpenAI

client = OpenAI(api_key="***", base_url="https://internal.devtest.truefoundry.tech/api/llm")

response = client.embeddings.create(
    model="test-v2-vertex/google-gemini-embedding-2-preview",
    input="What is the capital of France?",
    encoding_format="float",
)

output = [embed.embedding for embed in response.data]
print(output)

google-vertex/zai-org/glm-4.7 — tool-call (failure)

Error:

Traceback (most recent call last):
  File "/tmp/tmpazz7w7sg/snippet.py", line 27, in <module>
    response = client.chat.completions.create(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_utils/_utils.py", line 286, in wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/resources/chat/completions/completions.py", line 1147, in create
    return self._post(
           ^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 1259, in post
    return cast(ResponseT, self.request(cast_to, opts, stream=stream, stream_cls=stream_cls))
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 1047, in request
    raise self._make_status_error_from_response(err.response) from None
openai.NotFoundError: Error code: 404 - {'status': 'failure', 'message': 'vertex error: Publisher Model `projects/248190060486/locations/global/publishers/zai-org/models/glm-4.7` was not found or your project does not have access to it. Please ensure you are using a valid model version. For more information, see: https://cloud.google.com/vertex-ai/generative-ai/docs/learn/model-versions', 'error': {'message': 'vertex error: Publisher Model `projects/248190060486/locations/global/publishers/zai-org/models/glm-4.7` was not found or your project does not have access to it. Please ensure you are using a valid model version. For more information, see: https://cloud.google.com/vertex-ai/generative-ai/docs/learn/model-versions', 'type': 'APIError', 'code': '404'}, 'error_origin_level': 'api_error', 'provider': 'google-vertex'}

Code snippet

from openai import OpenAI

client = OpenAI(api_key="***", base_url="https://internal.devtest.truefoundry.tech/api/llm")

tools = [
    {
        "type": "function",
        "function": {
            "name": "get_weather",
            "description": "Get the current weather for a location.",
            "parameters": {
                "type": "object",
                "properties": {
                    "location": {
                        "type": "string",
                        "description": "The city name, e.g. London",
                    },
                },
                "required": ["location"],
                "additionalProperties": False,
            },
            "strict": True,
        },
    },
]

response = client.chat.completions.create(
    model="test-v2-vertex/zai-org-glm-4.7",
    messages=[
        {"role": "system", "content": "You are a helpful assistant with access to tools. You MUST strictly use the provided tools to answer. Never respond with plain text when a tool is available."},
        {"role": "user", "content": "Hi"},
        {"role": "assistant", "content": "Hi, how can I help you"},
        {"role": "user", "content": "Use the get_weather tool to check the weather in London. You must call the tool, do not respond with plain text."},
    ],
    tools=tools,
    tool_choice="auto",
    stream=False,
)

_message = response.choices[0].message
if _message.tool_calls:
    for _tc in _message.tool_calls:
        print(f"Function: {_tc.function.name}")
        print(f"Arguments: {_tc.function.arguments}")
else:
    print(_message.content)

if not _message.tool_calls or len(_message.tool_calls) == 0:
    raise Exception("VALIDATION FAILED: tool-call - no tool calls in response")
print("VALIDATION: tool-call SUCCESS")

google-vertex/zai-org/glm-4.7 — tool-call:stream (failure)

Error:

Traceback (most recent call last):
  File "/tmp/tmpc8_01hgf/snippet.py", line 27, in <module>
    response = client.chat.completions.create(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_utils/_utils.py", line 286, in wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/resources/chat/completions/completions.py", line 1147, in create
    return self._post(
           ^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 1259, in post
    return cast(ResponseT, self.request(cast_to, opts, stream=stream, stream_cls=stream_cls))
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 1047, in request
    raise self._make_status_error_from_response(err.response) from None
openai.NotFoundError: Error code: 404 - {'status': 'failure', 'message': 'vertex error: Publisher Model `projects/248190060486/locations/global/publishers/zai-org/models/glm-4.7` was not found or your project does not have access to it. Please ensure you are using a valid model version. For more information, see: https://cloud.google.com/vertex-ai/generative-ai/docs/learn/model-versions', 'error': {'message': 'vertex error: Publisher Model `projects/248190060486/locations/global/publishers/zai-org/models/glm-4.7` was not found or your project does not have access to it. Please ensure you are using a valid model version. For more information, see: https://cloud.google.com/vertex-ai/generative-ai/docs/learn/model-versions', 'type': 'APIError', 'code': '404'}, 'error_origin_level': 'api_error', 'provider': 'google-vertex'}

Code snippet

from openai import OpenAI

client = OpenAI(api_key="***", base_url="https://internal.devtest.truefoundry.tech/api/llm")

tools = [
    {
        "type": "function",
        "function": {
            "name": "get_weather",
            "description": "Get the current weather for a location.",
            "parameters": {
                "type": "object",
                "properties": {
                    "location": {
                        "type": "string",
                        "description": "The city name, e.g. London",
                    },
                },
                "required": ["location"],
                "additionalProperties": False,
            },
            "strict": True,
        },
    },
]

response = client.chat.completions.create(
    model="test-v2-vertex/zai-org-glm-4.7",
    messages=[
        {"role": "system", "content": "You are a helpful assistant with access to tools. You MUST strictly use the provided tools to answer. Never respond with plain text when a tool is available."},
        {"role": "user", "content": "Hi"},
        {"role": "assistant", "content": "Hi, how can I help you"},
        {"role": "user", "content": "Use the get_weather tool to check the weather in London. You must call the tool, do not respond with plain text."},
    ],
    tools=tools,
    tool_choice="auto",
    stream=True,
)

_tool_calls_made = False
for chunk in response:
    if chunk.choices and len(chunk.choices) > 0:
        delta = chunk.choices[0].delta
        if delta.content is not None:
            print(delta.content, end="", flush=True)
        if delta.tool_calls:
            _tool_calls_made = True
            for _tc in delta.tool_calls:
                if _tc.function:
                    print(_tc.function.arguments or "", end="", flush=True)

if not _tool_calls_made:
    raise Exception("VALIDATION FAILED: tool-call stream - no tool calls received")
print("\nVALIDATION: tool-call stream SUCCESS")

google-vertex/zai-org/glm-4.7 — params (failure)

Error:

Traceback (most recent call last):
  File "/tmp/tmp0xa_9vpp/snippet.py", line 5, in <module>
    response = client.chat.completions.create(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_utils/_utils.py", line 286, in wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/resources/chat/completions/completions.py", line 1147, in create
    return self._post(
           ^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 1259, in post
    return cast(ResponseT, self.request(cast_to, opts, stream=stream, stream_cls=stream_cls))
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 1047, in request
    raise self._make_status_error_from_response(err.response) from None
openai.NotFoundError: Error code: 404 - {'status': 'failure', 'message': 'vertex error: Publisher Model `projects/248190060486/locations/global/publishers/zai-org/models/glm-4.7` was not found or your project does not have access to it. Please ensure you are using a valid model version. For more information, see: https://cloud.google.com/vertex-ai/generative-ai/docs/learn/model-versions', 'error': {'message': 'vertex error: Publisher Model `projects/248190060486/locations/global/publishers/zai-org/models/glm-4.7` was not found or your project does not have access to it. Please ensure you are using a valid model version. For more information, see: https://cloud.google.com/vertex-ai/generative-ai/docs/learn/model-versions', 'type': 'APIError', 'code': '404'}, 'error_origin_level': 'api_error', 'provider': 'google-vertex'}

Code snippet

from openai import OpenAI

client = OpenAI(api_key="***", base_url="https://internal.devtest.truefoundry.tech/api/llm")

response = client.chat.completions.create(
    model="test-v2-vertex/zai-org-glm-4.7",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Hi"},
        {"role": "assistant", "content": "Hi, how can I help you"},
        {"role": "user", "content": "What is the capital of France?"},
    ],
    max_tokens=256,
    temperature=0.7,
    stream=False,
)

print(response.choices[0].message.content)

google-vertex/zai-org/glm-4.7 — structured-output:stream (failure)

Error:

Traceback (most recent call last):
  File "/tmp/tmpzk19gcnz/snippet.py", line 21, in <module>
    response = client.chat.completions.create(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_utils/_utils.py", line 286, in wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/resources/chat/completions/completions.py", line 1147, in create
    return self._post(
           ^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 1259, in post
    return cast(ResponseT, self.request(cast_to, opts, stream=stream, stream_cls=stream_cls))
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 1047, in request
    raise self._make_status_error_from_response(err.response) from None
openai.NotFoundError: Error code: 404 - {'status': 'failure', 'message': 'vertex error: Publisher Model `projects/248190060486/locations/global/publishers/zai-org/models/glm-4.7` was not found or your project does not have access to it. Please ensure you are using a valid model version. For more information, see: https://cloud.google.com/vertex-ai/generative-ai/docs/learn/model-versions', 'error': {'message': 'vertex error: Publisher Model `projects/248190060486/locations/global/publishers/zai-org/models/glm-4.7` was not found or your project does not have access to it. Please ensure you are using a valid model version. For more information, see: https://cloud.google.com/vertex-ai/generative-ai/docs/learn/model-versions', 'type': 'APIError', 'code': '404'}, 'error_origin_level': 'api_error', 'provider': 'google-vertex'}

Code snippet

from openai import OpenAI
import json

client = OpenAI(api_key="***", base_url="https://internal.devtest.truefoundry.tech/api/llm")

response_schema = json.loads('''{
  "title": "CalendarEvent",
  "type": "object",
  "properties": {
    "name": { "type": "string" },
    "date": { "type": "string" },
    "participants": {
      "type": "array",
      "items": { "type": "string" }
    }
  },
  "required": ["name", "date", "participants"],
  "additionalProperties": false
}''')

response = client.chat.completions.create(
    model="test-v2-vertex/zai-org-glm-4.7",
    messages=[
        {"role": "system", "content": "Extract the event information as JSON."},
        {"role": "user", "content": "Hi"},
        {"role": "assistant", "content": "Hi, how can I help you"},
        {"role": "user", "content": "Alice and Bob are going to a science fair on Friday. Extract the event details as JSON."},
    ],
    response_format={"type": "json_schema", "json_schema": {"name": "CalendarEvent", "schema": response_schema}},
    stream=True,
)

import json as _json

_accumulated = ""
for chunk in response:
    if chunk.choices and len(chunk.choices) > 0:
        delta = chunk.choices[0].delta
        if delta.content is not None:
            _accumulated += delta.content
            print(delta.content, end="", flush=True)

if not _accumulated:
    raise Exception("VALIDATION FAILED: structured-output stream - no content received")

_parsed = _json.loads(_accumulated)

if "name" not in _parsed or "date" not in _parsed or "participants" not in _parsed:
    raise Exception("VALIDATION FAILED: structured-output stream - missing expected fields (name, date, participants)")

if not isinstance(_parsed.get("participants"), list):
    raise Exception("VALIDATION FAILED: structured-output stream - 'participants' is not a list, schema not enforced")

if set(_parsed.keys()) != {"name", "date", "participants"}:
    raise Exception(
        f"VALIDATION FAILED: structured-output stream - unexpected keys present: {set(_parsed.keys())}"
    )

print("\nVALIDATION: structured-output stream SUCCESS")

google-vertex/zai-org/glm-4.7 — structured-output (failure)

Error:

Traceback (most recent call last):
  File "/tmp/tmpr5dgxp60/snippet.py", line 21, in <module>
    response = client.chat.completions.create(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_utils/_utils.py", line 286, in wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/resources/chat/completions/completions.py", line 1147, in create
    return self._post(
           ^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 1259, in post
    return cast(ResponseT, self.request(cast_to, opts, stream=stream, stream_cls=stream_cls))
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 1047, in request
    raise self._make_status_error_from_response(err.response) from None
openai.NotFoundError: Error code: 404 - {'status': 'failure', 'message': 'vertex error: Publisher Model `projects/248190060486/locations/global/publishers/zai-org/models/glm-4.7` was not found or your project does not have access to it. Please ensure you are using a valid model version. For more information, see: https://cloud.google.com/vertex-ai/generative-ai/docs/learn/model-versions', 'error': {'message': 'vertex error: Publisher Model `projects/248190060486/locations/global/publishers/zai-org/models/glm-4.7` was not found or your project does not have access to it. Please ensure you are using a valid model version. For more information, see: https://cloud.google.com/vertex-ai/generative-ai/docs/learn/model-versions', 'type': 'APIError', 'code': '404'}, 'error_origin_level': 'api_error', 'provider': 'google-vertex'}

Code snippet

from openai import OpenAI
import json

client = OpenAI(api_key="***", base_url="https://internal.devtest.truefoundry.tech/api/llm")

response_schema = json.loads('''{
  "title": "CalendarEvent",
  "type": "object",
  "properties": {
    "name": { "type": "string" },
    "date": { "type": "string" },
    "participants": {
      "type": "array",
      "items": { "type": "string" }
    }
  },
  "required": ["name", "date", "participants"],
  "additionalProperties": false
}''')

response = client.chat.completions.create(
    model="test-v2-vertex/zai-org-glm-4.7",
    messages=[
        {"role": "system", "content": "Extract the event information as JSON."},
        {"role": "user", "content": "Hi"},
        {"role": "assistant", "content": "Hi, how can I help you"},
        {"role": "user", "content": "Alice and Bob are going to a science fair on Friday. Extract the event details as JSON."},
    ],
    response_format={"type": "json_schema", "json_schema": {"name": "CalendarEvent", "schema": response_schema}},
    stream=False,
)

import json as _json

_content = response.choices[0].message.content
print(_content)

if not _content:
    raise Exception("VALIDATION FAILED: structured-output - response content is empty")

_parsed = _json.loads(_content)

if "name" not in _parsed or "date" not in _parsed or "participants" not in _parsed:
    raise Exception("VALIDATION FAILED: structured-output - missing expected fields (name, date, participants)")

if not isinstance(_parsed.get("participants"), list):
    raise Exception("VALIDATION FAILED: structured-output - 'participants' is not a list, schema not enforced")

if set(_parsed.keys()) != {"name", "date", "participants"}:
    raise Exception(
        f"VALIDATION FAILED: structured-output - unexpected keys present: {set(_parsed.keys())}"
    )

print("VALIDATION: structured-output SUCCESS")

google-vertex/zai-org/glm-4.7 — params:stream (failure)

Error:

Traceback (most recent call last):
  File "/tmp/tmp_0iiwwm5/snippet.py", line 5, in <module>
    response = client.chat.completions.create(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_utils/_utils.py", line 286, in wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/resources/chat/completions/completions.py", line 1147, in create
    return self._post(
           ^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 1259, in post
    return cast(ResponseT, self.request(cast_to, opts, stream=stream, stream_cls=stream_cls))
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 1047, in request
    raise self._make_status_error_from_response(err.response) from None
openai.NotFoundError: Error code: 404 - {'status': 'failure', 'message': 'vertex error: Publisher Model `projects/248190060486/locations/global/publishers/zai-org/models/glm-4.7` was not found or your project does not have access to it. Please ensure you are using a valid model version. For more information, see: https://cloud.google.com/vertex-ai/generative-ai/docs/learn/model-versions', 'error': {'message': 'vertex error: Publisher Model `projects/248190060486/locations/global/publishers/zai-org/models/glm-4.7` was not found or your project does not have access to it. Please ensure you are using a valid model version. For more information, see: https://cloud.google.com/vertex-ai/generative-ai/docs/learn/model-versions', 'type': 'APIError', 'code': '404'}, 'error_origin_level': 'api_error', 'provider': 'google-vertex'}

Code snippet

from openai import OpenAI

client = OpenAI(api_key="***", base_url="https://internal.devtest.truefoundry.tech/api/llm")

response = client.chat.completions.create(
    model="test-v2-vertex/zai-org-glm-4.7",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Hi"},
        {"role": "assistant", "content": "Hi, how can I help you"},
        {"role": "user", "content": "What is the capital of France?"},
    ],
    max_tokens=256,
    temperature=0.7,
    stream=True,
)

for chunk in response:
    if chunk.choices and len(chunk.choices) > 0:
        delta = chunk.choices[0].delta
        if delta.content is not None:
            print(delta.content, end="", flush=True)

Skipped (25)

google-vertex/anthropic/claude-opus-4@20250514 — skip-check (skipped)

Skip reason:

deprecated or retired model

google-vertex/anthropic/claude-sonnet-4@20250514 — skip-check (skipped)

Skip reason:

deprecated or retired model

google-vertex/deepseek-ai/deepseek-ocr-maas — skip-check (skipped)

Skip reason:

Single-turn model; does not support multi-turn conversations

google-vertex/gemini-2.5-computer-use-preview-10-2025 — skip-check (skipped)

Skip reason:

Requires the Computer Use tool to be enabled

google-vertex/gemini-2.5-flash-image — skip-check (skipped)

Skip reason:

unsupported mode 'image'

google-vertex/gemini-3-pro-image-preview — skip-check (skipped)

Skip reason:

unsupported mode 'image'

google-vertex/gemini-live-2.5-flash-native-audio — skip-check (skipped)

Skip reason:

unsupported mode 'realtime'

google-vertex/google/content-moderation — skip-check (skipped)

Skip reason:

unsupported mode 'moderation'

google-vertex/google/face-detector — skip-check (skipped)

Skip reason:

unsupported mode 'unknown'

google-vertex/google/gemini-2.5-computer-use-preview-10-2025 — skip-check (skipped)

Skip reason:

Requires the Computer Use tool to be enabled

google-vertex/google/language-v1-analyze-entity-sentiment — skip-check (skipped)

Skip reason:

unsupported mode 'unknown'

google-vertex/google/language-v1-analyze-syntax — skip-check (skipped)

Skip reason:

unsupported mode 'unknown'

google-vertex/google/object-detector — skip-check (skipped)

Skip reason:

unsupported mode 'video'

google-vertex/google/people-blur — skip-check (skipped)

Skip reason:

unsupported mode 'video'

google-vertex/google/ppe-detector — skip-check (skipped)

Skip reason:

unsupported mode 'video'

google-vertex/google/pretrained-form-parser — skip-check (skipped)

Skip reason:

unsupported mode 'unknown'

google-vertex/google/tag-recognizer — skip-check (skipped)

Skip reason:

unsupported mode 'image'

google-vertex/google/text-detector — skip-check (skipped)

Skip reason:

unsupported mode 'unknown'

google-vertex/google/text-translation — skip-check (skipped)

Skip reason:

unsupported mode 'unknown'

google-vertex/google/video-text-detection — skip-check (skipped)

Skip reason:

unsupported mode 'video'

google-vertex/imagen-3.0-capability-001 — skip-check (skipped)

Skip reason:

unsupported mode 'image'

google-vertex/imagen-3.0-generate-001 — skip-check (skipped)

Skip reason:

unsupported mode 'image'

google-vertex/imagen-4.0-fast-generate-001 — skip-check (skipped)

Skip reason:

unsupported mode 'image'

google-vertex/mongodb/voyage-3.5-lite — skip-check (skipped)

Skip reason:

Provisioned model

google-vertex/mongodb/voyage-4 — skip-check (skipped)

Skip reason:

Provisioned model

feat(google-vertex): update model YAMLs [bot]

0061e3d

cursor Bot reviewed May 5, 2026

View reviewed changes

revert regions and provisioning

1fda97e

cursor Bot reviewed May 7, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(google-vertex): update model YAMLs [bot]#946

feat(google-vertex): update model YAMLs [bot]#946
harshiv-26 wants to merge 2 commits intomainfrom
bot/update-google-vertex-20260505-135015

harshiv-26 commented May 5, 2026 •

edited by cursor Bot

Loading

Uh oh!

github-actions Bot commented May 5, 2026

Uh oh!

harshiv-26 commented May 5, 2026

Uh oh!

cursor Bot May 5, 2026

Uh oh!

github-actions Bot commented May 7, 2026

Uh oh!

harshiv-26 commented May 7, 2026

Uh oh!

cursor Bot left a comment

Uh oh!

cursor Bot May 7, 2026

Uh oh!

harshiv-26 commented May 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

harshiv-26 commented May 5, 2026 • edited by cursor Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions Bot commented May 5, 2026

Uh oh!

harshiv-26 commented May 5, 2026

Gateway test results

Uh oh!

cursor Bot May 5, 2026

Choose a reason for hiding this comment

TTS model uses wrong output cost key name

Uh oh!

github-actions Bot commented May 7, 2026

Uh oh!

harshiv-26 commented May 7, 2026

Uh oh!

cursor Bot left a comment

Choose a reason for hiding this comment

Uh oh!

cursor Bot May 7, 2026

Choose a reason for hiding this comment

Inconsistent status between two gemini-2.5-pro-tts model files

Uh oh!

harshiv-26 commented May 7, 2026

Gateway test results

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

harshiv-26 commented May 5, 2026 •

edited by cursor Bot

Loading