Skip to content

chore(pricing): Update vertex-ai pricing#637

Open
siddharthsambharia-portkey wants to merge 2 commits intomainfrom
pricing-update/vertex-ai-24229816500
Open

chore(pricing): Update vertex-ai pricing#637
siddharthsambharia-portkey wants to merge 2 commits intomainfrom
pricing-update/vertex-ai-24229816500

Conversation

@siddharthsambharia-portkey
Copy link
Copy Markdown
Collaborator

🔄 Pricing Update: vertex-ai

📊 Summary (complete_diff mode)

Change Type Count
➕ Models added 3
🔄 Models updated (merged) 13

➕ New Models

  • gemini-2.5-pro-tts
  • gemini-2.5-flash-tts
  • veo-3.1-lite-generate-001

🔄 Updated Models

  • gemini-2.5-pro
  • gemini-2.5-computer-use-preview-10-2025
  • gemini-2.0-flash-001
  • gemini-3.1-pro-preview
  • gemini-3.1-flash-lite-preview
  • gemini-3.1-flash-image-preview
  • gemini-3-pro-image-preview
  • gemini-3-flash-preview
  • veo-3.0-fast-generate-001
  • veo-3.1-fast-generate-001
  • gemini-embedding-001
  • textembedding-gecko-multilingual@001
  • multimodalembedding@001

Model-to-Pricing-Page Mapping

Model ID Publisher / Section Source Notes
gemini-2.5-pro Google – Gemini 2.5 API Standard ≤200K pricing; long-context (>200K) $2.50/$15 noted
gemini-2.5-flash Google – Gemini 2.5 API Includes audio input $1.00/1M
gemini-2.5-flash-image Google – Gemini 2.5 API Gemini image-output variant; image_token $30/1M
gemini-2.5-flash-lite Google – Gemini 2.5 API Lightest 2.5 model
gemini-2.5-computer-use-preview-10-2025 Google – Gemini 2.5 API Maps to Gemini 2.5 Pro pricing; no cache row found
gemini-2.5-flash-preview-09-2025 Google – Gemini 2.5 API Preview alias; same pricing as gemini-2.5-flash
gemini-2.5-flash-lite-preview-09-2025 Google – Gemini 2.5 API Preview alias; same pricing as gemini-2.5-flash-lite
gemini-2.0-flash-001 Google – Gemini 2.0 API Includes audio input $1.00/1M; cache_read $0.0375/1M
gemini-2.0-flash-lite-001 Google – Gemini 2.0 API No cache row on page
gemini-3.1-pro-preview Google – Gemini 3.1 API Standard ≤200K pricing; long-context (>200K) $4/$18 noted
gemini-3.1-flash-lite-preview Google – Gemini 3.1 API Includes audio input $0.50/1M
gemini-3.1-flash-image-preview Google – Gemini 3.1 API image_token $60/1M
gemini-3-pro-preview Google – Gemini 3 API Standard ≤200K pricing; long-context (>200K) $4/$18 noted
gemini-3-pro-image-preview Google – Gemini 3 API image_token $120/1M
gemini-3-flash-preview Google – Gemini 3 API Includes audio input $1.00/1M
gemini-2.5-pro-tts Google – Gemini API – price not found No TTS pricing row on page; added with price 0
gemini-2.5-flash-tts Google – Gemini API – price not found No TTS pricing row on page; added with price 0
imagen-4.0-generate-001 Google – Imagen 4 API $0.04/image; row matched via lookup_variant imagen-4.0-generate
imagen-4.0-fast-generate-001 Google – Imagen 4 API $0.02/image; row matched via lookup_variant imagen-4.0-fast-generate
imagen-4.0-ultra-generate-001 Google – Imagen 4 API $0.06/image; row matched via lookup_variant imagen-4.0-ultra-generate
imagen-3.0-generate-002 Google – Imagen 3 API $0.04/image; row matched via lookup_variant imagen-3.0-generate
imagen-3.0-capability-001 Google – Imagen 3 API Capability model; uses imagen-3.0-generate price $0.04/image
imagen-3.0-capability-002 Google – Imagen 3 API Capability model; uses imagen-3.0-generate price $0.04/image
veo-2.0-generate-001 Google – Veo 2 API $0.50/sec; matched via veo-2
veo-3.0-generate-001 Google – Veo 3 API $0.20/sec (720p/1080p); matched via veo-3
veo-3.0-fast-generate-001 Google – Veo 3 Fast API $0.08/sec (720p); matched via veo-3-fast
veo-3.1-generate-001 Google – Veo 3.1 API $0.20/sec (720p/1080p); matched via veo-3.1
veo-3.1-fast-generate-001 Google – Veo 3.1 Fast API $0.08/sec (720p); matched via veo-3.1-fast
veo-3.1-lite-generate-001 Google – Veo 3.1 Lite API $0.03/sec (720p); matched via veo-3.1-lite
gemini-embedding-001 Google – Embedding API $0.00015/1K tokens
text-embedding-005 Google – Embedding API $0.000025/1K chars (per_thousand_tokens)
text-multilingual-embedding-002 Google – Embedding API $0.000025/1K chars (per_thousand_tokens)
textembedding-gecko@003 Google – Embedding API $0.000025/1K chars; legacy model
textembedding-gecko-multilingual@001 Google – Embedding API $0.000025/1K chars; legacy multilingual
text-embedding-large-exp-03-07 Google – Embedding API No dedicated row; using text-embedding pricing $0.000025/1K chars
gemini-embedding-2-preview Google – Embedding API Gemini Embedding 2 (Unified Multimodal); $0.20/1M tokens
multimodalembedding@001 Google – Embedding API Multimodal embedding; $0.0002/1K chars text
claude-opus-4-6 Anthropic – Claude API @default stripped; input $5/$25; cache_write (5m) $6.25
claude-sonnet-4-6 Anthropic – Claude API @default stripped; input $3/$15; cache_write (5m) $3.75
claude-opus-4-5@20251101 Anthropic – Claude API Pinned date version; input $5/$25
claude-sonnet-4-5@20250929 Anthropic – Claude API Pinned date version; input $3/$15; long-context (>200K) $6/$22.50 noted
claude-haiku-4-5@20251001 Anthropic – Claude API Pinned date version; input $1/$5
claude-opus-4-1@20250805 Anthropic – Claude API Pinned date version; input $15/$75
claude-opus-4@20250514 Anthropic – Claude API Pinned date version; input $15/$75
claude-sonnet-4@20250514 Anthropic – Claude API Pinned date version; input $3/$15
gpt-oss-120b-maas OpenAI API Matched "gpt-oss-120b" on page; $0.09/$0.36
llama-3.3-70b-instruct-maas Meta – Llama API Matched "Llama 3.3 70B"; $0.72/$0.72
llama-4-maverick-17b-128e-instruct-maas Meta – Llama 4 API Matched "Llama 4 Maverick"; $0.35/$1.15
mistral-small-2503 Mistral AI API Matched "Mistral Small 3.1 (25.03)"; $0.10/$0.30
mistral-medium-3 Mistral AI API Matched "Mistral Medium 3"; $0.40/$2.00
codestral-2 Mistral AI API Matched "Codestral 2"; $0.30/$0.90
deepseek-r1-0528-maas DeepSeek API Matched "DeepSeek-R1 (0528)"; $1.35/$5.40
deepseek-v3.1-maas DeepSeek API Matched "DeepSeek-V3.1"; $0.60/$1.70; cache_read $0.06
deepseek-v3.2-maas DeepSeek API Matched "DeepSeek-V3.2"; $0.56/$1.68; cache_read $0.056
qwen3-235b-a22b-instruct-2507-maas Qwen API Matched "Qwen3 235B A22B (2507)"; $0.22/$0.88
qwen3-coder-480b-a35b-instruct-maas Qwen API Matched "Qwen3 Coder 480B"; $0.22/$1.80; cache_read $0.022
qwen3-next-80b-a3b-instruct-maas Qwen API Matched "Qwen3-Next 80B"; $0.15/$1.20
qwen3-next-80b-a3b-thinking-maas Qwen API Matched "Qwen3-Next 80B" thinking variant; $0.15/$1.20
kimi-k2-thinking-maas Moonshot AI – Kimi API Matched "Kimi-K2-Thinking"; $0.60/$2.50; cache_read $0.06
minimax-m2-maas MiniMax API Matched "MiniMax-M2"; $0.30/$1.20; cache_read $0.03
glm-4.7-maas ZAI.org – GLM API Matched "GLM-4.7"; $0.60/$2.20
glm-5-maas ZAI.org – GLM API Matched "GLM-5"; $1.00/$3.20; cache_read $0.10
jamba-large-1.6 AI21 API – price not found Self-deploy only (has_deploy: true, no -maas); no MaaS pricing row

Excluded Models

Model ID Publisher Reason
gemini-*-live-* Google Gemini Live streaming — separate product
lyria-* Google Music generation — no inference endpoint
model-optimizer-* Google Dynamic routing meta-endpoint
imagegeneration Google Legacy, superseded by imagen-3.0+
virtual-try-on-* Google Retail product model
gemma*, paligemma*, codegemma* Google Excluded per google.md (Gemma/non-generative)
chirp* Google Audio transcription, not generative inference
gemini-2.5-pro-tts / gemini-2.5-flash-tts Google Included with price 0 (no pricing row found)
clip-vit-base-patch32, openclip OpenAI Non-generative (feature extraction/classification)
whisper-large OpenAI Audio transcription
gpt-oss OpenAI Self-deploy (has_deploy: true, no -maas)
faster-r-cnn, retinanet, mask-r-cnn, segment-anything, sam3 Meta Non-generative CV models
roberta-large, xlm-roberta-large Meta Non-generative NLP (self-deploy)
llama-guard, prompt-guard Meta Guard/safety models
llama2, llama3, llama3_1, llama3-2, llama3-3, llama4, codellama-7b-hf, llama-2-quantized, imagebind, nllb Meta Self-deploy (has_deploy: true, no -maas)
mistral, mixtral Mistral AI Self-deploy (mistral-ai publisher, has_deploy: true)
codestral-2501-self-deploy Mistral AI Self-deploy in name
mistral-ocr-2505 Mistral AI OCR model
ministral-3, mistral-large-3 Mistral AI Self-deploy (has_deploy: true, no -maas)
deepseek-r1, deepseek-v3, deepseek-v3-1, deepseek-v3-2 DeepSeek Self-deploy (has_deploy: true, no -maas)
deepseek-ocr, deepseek-ocr-2, deepseek-ocr-maas DeepSeek OCR models
kimi-k2, kimi-k2-5 Moonshot AI Self-deploy (has_deploy: true, no -maas)
minimax-m2 MiniMax Self-deploy (has_deploy: true, no -maas)
glm-4.7, glm-5, glm-4.5 ZAI.org Self-deploy (has_deploy: true, no -maas)
glm-ocr ZAI.org OCR model
glm-image ZAI.org Explicit policy exclusion (image gen excluded from Vertex AI pricing)
qwen-image Qwen Explicit policy exclusion (image gen excluded from Vertex AI pricing)
Remaining Qwen self-deploy models Qwen Self-deploy (has_deploy: true, no -maas)

Web Search Pricing by Generation

  • Gemini 2.0 + 2.5: web_search $35/1000 → 3.5¢/search; enterprise_web_search $45/1000 → 4.5¢/search
  • Gemini 3.x: web_search $14/1000 → 1.4¢/search; enterprise_web_search $14/1000 → 1.4¢/search

Generated by Pricing Agent on 2026-04-10

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant