chore(pricing): Update vertex-ai pricing by siddharthsambharia-portkey · Pull Request #637 · Portkey-AI/models

siddharthsambharia-portkey · 2026-04-10T06:39:08Z

🔄 Pricing Update: vertex-ai

📊 Summary (complete_diff mode)

Change Type	Count
➕ Models added	3
🔄 Models updated (merged)	13

➕ New Models

gemini-2.5-pro-tts
gemini-2.5-flash-tts
veo-3.1-lite-generate-001

🔄 Updated Models

gemini-2.5-pro
gemini-2.5-computer-use-preview-10-2025
gemini-2.0-flash-001
gemini-3.1-pro-preview
gemini-3.1-flash-lite-preview
gemini-3.1-flash-image-preview
gemini-3-pro-image-preview
gemini-3-flash-preview
veo-3.0-fast-generate-001
veo-3.1-fast-generate-001
gemini-embedding-001
textembedding-gecko-multilingual@001
multimodalembedding@001

Model-to-Pricing-Page Mapping

Model ID	Publisher / Section	Source	Notes
`gemini-2.5-pro`	Google – Gemini 2.5	API	Standard ≤200K pricing; long-context (>200K) $2.50/$15 noted
`gemini-2.5-flash`	Google – Gemini 2.5	API	Includes audio input $1.00/1M
`gemini-2.5-flash-image`	Google – Gemini 2.5	API	Gemini image-output variant; image_token $30/1M
`gemini-2.5-flash-lite`	Google – Gemini 2.5	API	Lightest 2.5 model
`gemini-2.5-computer-use-preview-10-2025`	Google – Gemini 2.5	API	Maps to Gemini 2.5 Pro pricing; no cache row found
`gemini-2.5-flash-preview-09-2025`	Google – Gemini 2.5	API	Preview alias; same pricing as gemini-2.5-flash
`gemini-2.5-flash-lite-preview-09-2025`	Google – Gemini 2.5	API	Preview alias; same pricing as gemini-2.5-flash-lite
`gemini-2.0-flash-001`	Google – Gemini 2.0	API	Includes audio input $1.00/1M; cache_read $0.0375/1M
`gemini-2.0-flash-lite-001`	Google – Gemini 2.0	API	No cache row on page
`gemini-3.1-pro-preview`	Google – Gemini 3.1	API	Standard ≤200K pricing; long-context (>200K) $4/$18 noted
`gemini-3.1-flash-lite-preview`	Google – Gemini 3.1	API	Includes audio input $0.50/1M
`gemini-3.1-flash-image-preview`	Google – Gemini 3.1	API	image_token $60/1M
`gemini-3-pro-preview`	Google – Gemini 3	API	Standard ≤200K pricing; long-context (>200K) $4/$18 noted
`gemini-3-pro-image-preview`	Google – Gemini 3	API	image_token $120/1M
`gemini-3-flash-preview`	Google – Gemini 3	API	Includes audio input $1.00/1M
`gemini-2.5-pro-tts`	Google – Gemini	API – price not found	No TTS pricing row on page; added with price 0
`gemini-2.5-flash-tts`	Google – Gemini	API – price not found	No TTS pricing row on page; added with price 0
`imagen-4.0-generate-001`	Google – Imagen 4	API	$0.04/image; row matched via lookup_variant `imagen-4.0-generate`
`imagen-4.0-fast-generate-001`	Google – Imagen 4	API	$0.02/image; row matched via lookup_variant `imagen-4.0-fast-generate`
`imagen-4.0-ultra-generate-001`	Google – Imagen 4	API	$0.06/image; row matched via lookup_variant `imagen-4.0-ultra-generate`
`imagen-3.0-generate-002`	Google – Imagen 3	API	$0.04/image; row matched via lookup_variant `imagen-3.0-generate`
`imagen-3.0-capability-001`	Google – Imagen 3	API	Capability model; uses imagen-3.0-generate price $0.04/image
`imagen-3.0-capability-002`	Google – Imagen 3	API	Capability model; uses imagen-3.0-generate price $0.04/image
`veo-2.0-generate-001`	Google – Veo 2	API	$0.50/sec; matched via `veo-2`
`veo-3.0-generate-001`	Google – Veo 3	API	$0.20/sec (720p/1080p); matched via `veo-3`
`veo-3.0-fast-generate-001`	Google – Veo 3 Fast	API	$0.08/sec (720p); matched via `veo-3-fast`
`veo-3.1-generate-001`	Google – Veo 3.1	API	$0.20/sec (720p/1080p); matched via `veo-3.1`
`veo-3.1-fast-generate-001`	Google – Veo 3.1 Fast	API	$0.08/sec (720p); matched via `veo-3.1-fast`
`veo-3.1-lite-generate-001`	Google – Veo 3.1 Lite	API	$0.03/sec (720p); matched via `veo-3.1-lite`
`gemini-embedding-001`	Google – Embedding	API	$0.00015/1K tokens
`text-embedding-005`	Google – Embedding	API	$0.000025/1K chars (per_thousand_tokens)
`text-multilingual-embedding-002`	Google – Embedding	API	$0.000025/1K chars (per_thousand_tokens)
`textembedding-gecko@003`	Google – Embedding	API	$0.000025/1K chars; legacy model
`textembedding-gecko-multilingual@001`	Google – Embedding	API	$0.000025/1K chars; legacy multilingual
`text-embedding-large-exp-03-07`	Google – Embedding	API	No dedicated row; using text-embedding pricing $0.000025/1K chars
`gemini-embedding-2-preview`	Google – Embedding	API	Gemini Embedding 2 (Unified Multimodal); $0.20/1M tokens
`multimodalembedding@001`	Google – Embedding	API	Multimodal embedding; $0.0002/1K chars text
`claude-opus-4-6`	Anthropic – Claude	API	@default stripped; input $5/$25; cache_write (5m) $6.25
`claude-sonnet-4-6`	Anthropic – Claude	API	@default stripped; input $3/$15; cache_write (5m) $3.75
`claude-opus-4-5@20251101`	Anthropic – Claude	API	Pinned date version; input $5/$25
`claude-sonnet-4-5@20250929`	Anthropic – Claude	API	Pinned date version; input $3/$15; long-context (>200K) $6/$22.50 noted
`claude-haiku-4-5@20251001`	Anthropic – Claude	API	Pinned date version; input $1/$5
`claude-opus-4-1@20250805`	Anthropic – Claude	API	Pinned date version; input $15/$75
`claude-opus-4@20250514`	Anthropic – Claude	API	Pinned date version; input $15/$75
`claude-sonnet-4@20250514`	Anthropic – Claude	API	Pinned date version; input $3/$15
`gpt-oss-120b-maas`	OpenAI	API	Matched "gpt-oss-120b" on page; $0.09/$0.36
`llama-3.3-70b-instruct-maas`	Meta – Llama	API	Matched "Llama 3.3 70B"; $0.72/$0.72
`llama-4-maverick-17b-128e-instruct-maas`	Meta – Llama 4	API	Matched "Llama 4 Maverick"; $0.35/$1.15
`mistral-small-2503`	Mistral AI	API	Matched "Mistral Small 3.1 (25.03)"; $0.10/$0.30
`mistral-medium-3`	Mistral AI	API	Matched "Mistral Medium 3"; $0.40/$2.00
`codestral-2`	Mistral AI	API	Matched "Codestral 2"; $0.30/$0.90
`deepseek-r1-0528-maas`	DeepSeek	API	Matched "DeepSeek-R1 (0528)"; $1.35/$5.40
`deepseek-v3.1-maas`	DeepSeek	API	Matched "DeepSeek-V3.1"; $0.60/$1.70; cache_read $0.06
`deepseek-v3.2-maas`	DeepSeek	API	Matched "DeepSeek-V3.2"; $0.56/$1.68; cache_read $0.056
`qwen3-235b-a22b-instruct-2507-maas`	Qwen	API	Matched "Qwen3 235B A22B (2507)"; $0.22/$0.88
`qwen3-coder-480b-a35b-instruct-maas`	Qwen	API	Matched "Qwen3 Coder 480B"; $0.22/$1.80; cache_read $0.022
`qwen3-next-80b-a3b-instruct-maas`	Qwen	API	Matched "Qwen3-Next 80B"; $0.15/$1.20
`qwen3-next-80b-a3b-thinking-maas`	Qwen	API	Matched "Qwen3-Next 80B" thinking variant; $0.15/$1.20
`kimi-k2-thinking-maas`	Moonshot AI – Kimi	API	Matched "Kimi-K2-Thinking"; $0.60/$2.50; cache_read $0.06
`minimax-m2-maas`	MiniMax	API	Matched "MiniMax-M2"; $0.30/$1.20; cache_read $0.03
`glm-4.7-maas`	ZAI.org – GLM	API	Matched "GLM-4.7"; $0.60/$2.20
`glm-5-maas`	ZAI.org – GLM	API	Matched "GLM-5"; $1.00/$3.20; cache_read $0.10
`jamba-large-1.6`	AI21	API – price not found	Self-deploy only (has_deploy: true, no -maas); no MaaS pricing row

Excluded Models

Model ID	Publisher	Reason
`gemini--live-`	Google	Gemini Live streaming — separate product
`lyria-*`	Google	Music generation — no inference endpoint
`model-optimizer-*`	Google	Dynamic routing meta-endpoint
`imagegeneration`	Google	Legacy, superseded by imagen-3.0+
`virtual-try-on-*`	Google	Retail product model
`gemma`, `paligemma`, `codegemma*`	Google	Excluded per google.md (Gemma/non-generative)
`chirp*`	Google	Audio transcription, not generative inference
`gemini-2.5-pro-tts` / `gemini-2.5-flash-tts`	Google	Included with price 0 (no pricing row found)
`clip-vit-base-patch32`, `openclip`	OpenAI	Non-generative (feature extraction/classification)
`whisper-large`	OpenAI	Audio transcription
`gpt-oss`	OpenAI	Self-deploy (has_deploy: true, no -maas)
`faster-r-cnn`, `retinanet`, `mask-r-cnn`, `segment-anything`, `sam3`	Meta	Non-generative CV models
`roberta-large`, `xlm-roberta-large`	Meta	Non-generative NLP (self-deploy)
`llama-guard`, `prompt-guard`	Meta	Guard/safety models
`llama2`, `llama3`, `llama3_1`, `llama3-2`, `llama3-3`, `llama4`, `codellama-7b-hf`, `llama-2-quantized`, `imagebind`, `nllb`	Meta	Self-deploy (has_deploy: true, no -maas)
`mistral`, `mixtral`	Mistral AI	Self-deploy (mistral-ai publisher, has_deploy: true)
`codestral-2501-self-deploy`	Mistral AI	Self-deploy in name
`mistral-ocr-2505`	Mistral AI	OCR model
`ministral-3`, `mistral-large-3`	Mistral AI	Self-deploy (has_deploy: true, no -maas)
`deepseek-r1`, `deepseek-v3`, `deepseek-v3-1`, `deepseek-v3-2`	DeepSeek	Self-deploy (has_deploy: true, no -maas)
`deepseek-ocr`, `deepseek-ocr-2`, `deepseek-ocr-maas`	DeepSeek	OCR models
`kimi-k2`, `kimi-k2-5`	Moonshot AI	Self-deploy (has_deploy: true, no -maas)
`minimax-m2`	MiniMax	Self-deploy (has_deploy: true, no -maas)
`glm-4.7`, `glm-5`, `glm-4.5`	ZAI.org	Self-deploy (has_deploy: true, no -maas)
`glm-ocr`	ZAI.org	OCR model
`glm-image`	ZAI.org	Explicit policy exclusion (image gen excluded from Vertex AI pricing)
`qwen-image`	Qwen	Explicit policy exclusion (image gen excluded from Vertex AI pricing)
Remaining Qwen self-deploy models	Qwen	Self-deploy (has_deploy: true, no -maas)

Web Search Pricing by Generation

Gemini 2.0 + 2.5: web_search $35/1000 → 3.5¢/search; enterprise_web_search $45/1000 → 4.5¢/search
Gemini 3.x: web_search $14/1000 → 1.4¢/search; enterprise_web_search $14/1000 → 1.4¢/search

Generated by Pricing Agent on 2026-04-10

siddharthsambharia-portkey added 2 commits April 10, 2026 12:09

chore(pricing): Update vertex-ai pricing

c74cf9e

chore(general): Add 1 new vertex-ai model configs

6e1b2f9

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

chore(pricing): Update vertex-ai pricing#637

chore(pricing): Update vertex-ai pricing#637
siddharthsambharia-portkey wants to merge 2 commits intomainfrom
pricing-update/vertex-ai-24229816500

siddharthsambharia-portkey commented Apr 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

siddharthsambharia-portkey commented Apr 10, 2026

🔄 Pricing Update: vertex-ai

📊 Summary (complete_diff mode)

➕ New Models

🔄 Updated Models

Model-to-Pricing-Page Mapping

Excluded Models

Web Search Pricing by Generation

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant