Skip to content

chore(pricing): Update vertex-ai pricing#625

Open
siddharthsambharia-portkey wants to merge 2 commits intomainfrom
pricing-update/vertex-ai-24206118106
Open

chore(pricing): Update vertex-ai pricing#625
siddharthsambharia-portkey wants to merge 2 commits intomainfrom
pricing-update/vertex-ai-24206118106

Conversation

@siddharthsambharia-portkey
Copy link
Copy Markdown
Collaborator

🔄 Pricing Update: vertex-ai

📊 Summary (complete_diff mode)

Change Type Count
➕ Models added 4
🔄 Models updated (merged) 10

➕ New Models

  • gemini-2.5-pro-tts
  • gemini-2.5-flash-tts
  • veo-3.1-lite-generate-001
  • translate-llm

🔄 Updated Models

  • gemini-3.1-pro-preview
  • gemini-3.1-flash-image-preview
  • gemini-3.1-flash-lite-preview
  • gemini-3-pro-image-preview
  • gemini-3-flash-preview
  • gemini-2.5-pro
  • textembedding-gecko@001
  • textembedding-gecko-multilingual@001
  • multimodalembedding@001
  • veo-3.1-fast-generate-001

Model → Pricing Page Mapping

Model ID Publisher / Section Source Notes
gemini-3.1-pro-preview Google – Gemini 3.1 Pro API Standard $2/$12/1M, batch $1/$6, cache_read $0.20, web_search $14/1000
gemini-3.1-flash-image-preview Google – Gemini 3.1 Flash Image API Standard $0.50/$3/1M, batch $0.25/$1.50, image_token $60/1M
gemini-3.1-flash-lite-preview Google – Gemini 3.1 Flash Lite API Standard $0.25/$1.50/1M, batch $0.13/$0.75, cache_read $0.03
gemini-3-pro-preview Google – Gemini 3 Pro API Standard $2/$12/1M, batch $1/$6, cache_read $0.20
gemini-3-pro-image-preview Google – Gemini 3 Pro Image API Standard $2/$12/1M, batch $1/$6, image_token $120/1M
gemini-3-flash-preview Google – Gemini 3 Flash API Standard $0.50/$3/1M, batch $0.25/$1.50, cache_read $0.05
gemini-2.5-pro Google – Gemini 2.5 Pro API Standard $1.25/$10/1M (≤200K), batch $0.625/$5, cache_read $0.13
gemini-2.5-flash Google – Gemini 2.5 Flash API Standard $0.30/$2.50/1M, batch $0.15/$1.25, cache_read $0.03
gemini-2.5-flash-image Google – Gemini 2.5 Flash Image API Standard $0.30/$2.50/1M, batch $0.15/$1.25, image_token $30/1M
gemini-2.5-flash-lite Google – Gemini 2.5 Flash Lite API Standard $0.10/$0.40/1M, batch $0.05/$0.20, cache_read $0.01
gemini-2.5-computer-use-preview-10-2025 Google – Gemini 2.5 Pro (Computer Use) API Same pricing as Gemini 2.5 Pro, no batch
gemini-2.5-pro-tts Google – Gemini (TTS) API – price not found No dedicated TTS row on pricing page; added with price 0
gemini-2.5-flash-tts Google – Gemini (TTS) API – price not found No dedicated TTS row on pricing page; added with price 0
gemini-2.5-flash-preview-09-2025 Google – Gemini 2.5 Flash API Preview alias for gemini-2.5-flash; same pricing
gemini-2.5-flash-lite-preview-09-2025 Google – Gemini 2.5 Flash Lite API Preview alias for gemini-2.5-flash-lite; same pricing
gemini-2.0-flash-001 Google – Gemini 2.0 Flash API Standard $0.15/$0.60/1M, batch $0.075/$0.30
gemini-2.0-flash-lite-001 Google – Gemini 2.0 Flash Lite API Standard $0.075/$0.30/1M, batch $0.0375/$0.15
gemini-embedding-001 Google – Embedding API $0.00015/1K tokens (online)
text-embedding-005 Google – Embedding API $0.000025/1K chars (online)
text-multilingual-embedding-002 Google – Embedding API $0.000025/1K chars; same as text-embedding-005
textembedding-gecko@001 Google – Embedding (legacy) API Legacy; same pricing as text-embedding-005
textembedding-gecko@003 Google – Embedding (legacy) API Legacy; same pricing as text-embedding-005
textembedding-gecko-multilingual@001 Google – Embedding (legacy) API Legacy multilingual; same pricing as text-embedding-005
text-embedding-large-exp-03-07 Google – Embedding (experimental) API Experimental; no dedicated row; uses text-embedding-005 pricing
multimodalembedding@001 Google – Multimodal Embedding API Text $0.0002/1K chars; image $0.0001/img; video plus $0.0020/s, standard $0.0010/s, essential $0.0005/s
gemini-embedding-2-preview Google – Gemini Embedding 2 (multimodal, preview) API Text $0.20/1M tokens; image $0.00012/img; video $0.00079/img; audio $0.00016/s
imagen-3.0-generate-002 Google – Imagen 3 Generate API $0.04/image
imagen-4.0-generate-001 Google – Imagen 4.0 Generate API $0.04/image
imagen-4.0-fast-generate-001 Google – Imagen 4.0 Fast Generate API $0.02/image
imagen-4.0-ultra-generate-001 Google – Imagen 4.0 Ultra Generate API $0.06/image
imagen-3.0-capability-001 Google – Imagen 3 Capability API Maps to imagen-3.0-generate-002 pricing; $0.04/image
imagen-3.0-capability-002 Google – Imagen 3 Capability API Maps to imagen-3.0-generate-002 pricing; $0.04/image
veo-2.0-generate-001 Google – Veo 2.0 API $0.50/s, default 8s, 1 sample
veo-3.0-generate-001 Google – Veo 3.0 API $0.20/s, default 8s, 1 sample
veo-3.0-fast-generate-001 Google – Veo 3.0 Fast API $0.10/s, default 8s, 1 sample
veo-3.1-generate-001 Google – Veo 3.1 API $0.20/s, default 8s, 1 sample
veo-3.1-fast-generate-001 Google – Veo 3.1 Fast API $0.10/s, default 8s, 1 sample
veo-3.1-lite-generate-001 Google – Veo 3.1 Lite API $0.03/s, default 8s, 1 sample
translate-llm Google – Translation API – price not found Translation model; no row on generative AI pricing page; added with price 0
claude-opus-4-6 Anthropic – Claude Opus 4.6 API $5/$25/1M; cache_write_5m $6.25, cache_read $0.50; @default stripped
claude-sonnet-4-6 Anthropic – Claude Sonnet 4.6 API $3/$15/1M; cache_write_5m $3.75, cache_read $0.30; @default stripped
claude-opus-4-5@20251101 Anthropic – Claude Opus 4.5 API $5/$25/1M; cache_write_5m $6.25, cache_read $0.50
claude-sonnet-4-5@20250929 Anthropic – Claude Sonnet 4.5 API $3/$15/1M (≤200K); cache_write_5m $3.75, cache_read $0.30
claude-haiku-4-5@20251001 Anthropic – Claude Haiku 4.5 API $1/$5/1M; cache_write_5m $1.25, cache_read $0.10
claude-opus-4-1@20250805 Anthropic – Claude Opus 4.1 API $15/$75/1M; cache_write_5m $18.75, cache_read $1.50
claude-opus-4@20250514 Anthropic – Claude Opus 4 API $15/$75/1M; cache_write_5m $18.75, cache_read $1.50
claude-sonnet-4@20250514 Anthropic – Claude Sonnet 4 API $3/$15/1M; cache_write_5m $3.75, cache_read $0.30
llama-3.3-70b-instruct-maas Meta – Llama 3.3 70B API $0.72/$0.72/1M
llama-4-maverick-17b-128e-instruct-maas Meta – Llama 4 Maverick API $0.35/$1.15/1M
mistral-small-2503 Mistral – Mistral Small 3.1 API $0.10/$0.30/1M
mistral-medium-3 Mistral – Mistral Medium 3 API $0.40/$2.00/1M
codestral-2 Mistral – Codestral 2 API $0.30/$0.90/1M
deepseek-r1-0528-maas DeepSeek – DeepSeek-R1 0528 API $1.35/$5.40/1M
deepseek-v3.1-maas DeepSeek – DeepSeek-V3.1 API $0.60/$1.70/1M; cache_read $0.06
deepseek-v3.2-maas DeepSeek – DeepSeek-V3.2 API $0.56/$1.68/1M; cache_read $0.056
qwen3-235b-a22b-instruct-2507-maas Qwen – Qwen3-235B-A22B-Instruct-2507 API $0.22/$0.88/1M
qwen3-coder-480b-a35b-instruct-maas Qwen – Qwen3-Coder-480B API $0.22/$1.80/1M; cache_read $0.022
qwen3-next-80b-a3b-instruct-maas Qwen – Qwen3-Next-80B-Instruct API $0.15/$1.20/1M
qwen3-next-80b-a3b-thinking-maas Qwen – Qwen3-Next-80B-Thinking API $0.15/$1.20/1M
kimi-k2-thinking-maas Kimi/Moonshot – Kimi-K2-Thinking API $0.60/$2.50/1M; cache_read $0.06
minimax-m2-maas MiniMax – MiniMax-M2 API $0.30/$1.20/1M; cache_read $0.03
glm-4.7-maas ZAI.org/GLM – GLM-4.7 API $0.60/$2.20/1M
glm-5-maas ZAI.org/GLM – GLM-5 API $1.00/$3.20/1M; cache_read $0.10; free until Feb 19, 2026 per page note
gpt-oss-120b-maas OpenAI – gpt-oss-120b API $0.09/$0.36/1M

Excluded Models (not added to pricing JSON)

Model Publisher Reason
gemini-live-2.5-flash-native-audio Google *-live-* pattern — Gemini Live streaming, separate product
lyria-002, lyria-3-pro-preview, lyria-3-clip-preview Google lyria-* — music generation, no inference endpoint
virtual-try-on-001 Google Product-specific retail model
imagegeneration Google Legacy, superseded by imagen-3.0+
shieldgemma2 Google Guard model (*guard* pattern)
chirp-* models Google Audio transcription
weathernext, weather-next-v2 Google Weather forecasting, not generative AI
imageclassification-*, occupancy-analytics, bert-base, vehicle-detector Google Non-generative CV/NLP
pretrained-ocr, text-detector, pretrained-form-parser Google OCR models
faster-r-cnn, retinanet, mask-r-cnn Google Computer vision, non-generative
t5gemma, earth-ai-* Google Self-deploy (has_deploy: true, no -maas)
jamba-large-1.6 AI21 Self-deploy (has_deploy: true, no -maas)
codestral-2501-self-deploy, ministral-3, mistral-large-3 Mistral Self-deploy (no -maas)
mistral-ocr-2505 Mistral OCR model
deepseek-ocr-maas, deepseek-ocr, deepseek-ocr-2 DeepSeek OCR models
llama-guard, prompt-guard Meta Guard/safety models
codellama-7b-hf, llama2, llama-2-quantized, llama3, llama3_1, llama3-2, llama3-3, llama4 Meta Self-deploy (no -maas)
sam3 Meta Image segmentation, non-generative
nllb, imagebind Meta Self-deploy
kimi-k2-5, kimi-k2 Kimi Self-deploy (no -maas)
minimax-m2 MiniMax Self-deploy (no -maas)
glm-image ZAI.org Explicitly excluded per global rules
glm-ocr, glm-4.7, glm-5, glm-4.5 ZAI.org OCR / self-deploy (no -maas)
qwen-image Qwen Explicitly excluded per global rules
qwq, qwen3, qwen3-embedding, qwen3-5, qwen2, qwen3-coder-next, qwen3-coder, qwen3-next, qwen3-vl Qwen Self-deploy (no -maas)
gpt-oss OpenAI Self-deploy (no -maas)
clip-vit-base-patch32, openclip OpenAI Non-generative embedding/vision
whisper-large OpenAI Audio transcription, not generative inference

Data Sources


Generated by Pricing Agent on 2026-04-09

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant