Skip to content

ReadyPixels/AI_Models_Matrix

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

91 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

AI Models Matrix Logo

Awesome AI Models Matrix 🧠

Awesome License: CC BY-NC 4.0 Last Updated

Research-based list of AI models, development tools, and automation resources. Use it to compare releases, pricing, benchmarks, and deployment options from official sources.

Document Version: 2.1 Last Updated: 2026-04-02 04:58 UTC Repository: https://github.com/ReadyPixels/AI_Models_Matrix

Contents

Models 🧠

Comprehensive documentation of Large Language Models (LLMs), Small Language Models (SLMs), and specialized AI models available today.

Frontier Models πŸš€

State-of-the-art proprietary AI models with cutting-edge capabilities from leading AI labs.

Model Company Context GPQA Diamond Arena Elo SWE-bench Verified AIME 2025 Pricing Verified
Claude Opus 4.6 Anthropic 1M 91.3% 1500 (Text) / 1549 (Code) 80.8% 99.8% $5 / $25 2026-04-02
Claude Sonnet 4.6 Anthropic 1M 89.9% ~1438 (Text) / 1523 (Code) 79.6% ~95% $3 / $15 2026-04-02
GPT-5.3-Codex OpenAI 400K 91.5% β€” β€” β€” $1.75 / $14.00 2026-04-02
Gemini 3.1 Pro Google 1M 94.3% 1494 (Text) / 1455 (Code) 80.6% 100% $2 / $12 2026-04-02
Gemini 3 Deep Think Google 1M+ ~97% β€” ~58% β€” Ultra subscription 2026-04-02
GLM-5 Zhipu AI 200K 82.0% ~1451 (Text) / 1445 (Code) 77.8% 92.7% $1.00 / $3.20 2026-04-02
GLM-5.1 Zhipu AI 200K β€” β€” ~80.4% (est.) β€” $1.00 / $3.20 2026-04-02
MiniMax-M2.5 MiniMax 200K 85.2% β€” 80.2% 86.3% $0.30 / $1.20 2026-04-02
Kimi K2.5 Moonshot AI 256K 87.6% β€” 76.8% 96.1% $0.60 / $3.00 2026-04-02
DeepSeek-V4 [⚠️ Unverified] DeepSeek 1M+ β€” β€” β€” β€” $0.30 / $0.50 2026-04-02
DeepSeek-V3.2 DeepSeek 164K 87.1% β€” 67.8% 89.3% $0.28 / $0.42 2026-04-02
Qwen3.5-Max Alibaba 128K 89.3% β€” 76.4% 91.3% Pay-per-token 2026-04-02
Gemini 3 Pro Google 1M+ 91.9% 1486 (Text) / 1438 (Code) 76.2% 98–100% Tiered pricing 2026-04-02
Gemini 3 Flash Google 10M 90.4% 1474 (Text) 78.0% β€” $0.30 / $2.50 2026-04-02
Gemini 3.1 Flash-Lite [⚠️ Unverified] Google 1M β€” β€” β€” β€” $0.25 / $1.50 2026-04-02
GPT-5.4 OpenAI 1M 92.0% 1484 (Text) / 1457 (Code) ~80% 88% $2.50 / $15.00 2026-04-02
GPT-5.4 mini OpenAI 400K 87.5% β€” β€” β€” $0.75 / $4.50 2026-04-02
GPT-5.4 nano OpenAI 400K β€” β€” β€” β€” $0.20 / $1.25 2026-04-02
Step-3.5-Flash StepFun 256K 83.1% β€” 74.4% 97.3% Pay-per-token 2026-04-02
Mistral Large 3 [⚠️ Unverified] Mistral AI 128K β€” β€” β€” β€” Varies 2026-04-02
Claude Sonnet 4.5 Anthropic 200K 83.4% β€” 77.2% 87% $3 / $15 2026-04-02
Llama 4 Scout [⚠️ Unverified] Meta 10M 57.2% β€” β€” β€” Free (self-host) 2026-04-02
Llama 4 Maverick [⚠️ Unverified] Meta 128K 69.8% β€” β€” β€” Free (self-host) 2026-04-02
Grok 4 xAI 128K ~91.5% ~1493 (Text) β€” 100% $3 / $15 2026-04-02
Grok 4 Fast [⚠️ Unverified] xAI 128K β€” β€” β€” β€” $0.20 / $1.50 2026-04-02

Top Models by Category

Category #1 #2 #3
Coding Claude Opus 4.6 GPT-5.3-Codex Claude Sonnet 4.5
Reasoning Gemini 3 Deep Think Qwen3-Max-Thinking o3
Open Source DeepSeek-V4 Qwen3.5-Max Llama 4
Cost Efficiency DeepSeek-V3.1 Grok 4 Fast GLM-4.7-FlashX
Context Window Gemini 3 Flash (10M) Llama 4 Scout (10M) Claude Opus 4.6 (1M)

Model Specifications πŸ“‹

Detailed technical specifications, pricing, and capabilities for all frontier models. Data as of April 2026.

Output Token Limits

Maximum output tokens per single API request.

Model Max Output Context Window Notes
Claude Opus 4.6 128K (300K via beta) 1M Extended output via output-128k-2025-02-19 beta header
Claude Sonnet 4.6 64K 1M β€”
Claude Sonnet 4.5 64K 200K β€”
GPT-5.4 128K 1.05M β€”
GPT-5.4 mini 128K 400K β€”
GPT-5.4 nano 128K 400K β€”
GPT-5.3-Codex 128K 400K β€”
Gemini 3.1 Pro 64K 1M β€”
Gemini 3 Pro 64K 2M β€”
Gemini 3 Flash 64K 1M β€”
Gemini 3.1 Flash-Lite 64K 1M β€”
DeepSeek-V4 16K 1M β€”
DeepSeek-V3.2 8K / 64K (reasoner) 128K Reasoner mode unlocks 64K output
Qwen3.5-Max 65K 1M β€”
GLM-5 128K 200K β€”
GLM-5.1 131K 200K β€”
MiniMax-M2.5 131K 1M β€”
Kimi K2.5 β€” 256K Not publicly specified
Step-3.5-Flash 66K 256K β€”
Grok 4 β€” 256K Not publicly specified
Grok 4 Fast 30K 2M β€”
Mistral Large 3 β€” 262K Not publicly specified
Llama 4 Scout 16K 10M β€”
Llama 4 Maverick 16K 1M β€”

Cached & Batch Pricing

Discounted pricing tiers for high-volume usage. All prices in USD per million tokens.

Model Standard Input Cached Input Batch Discount Notes
Claude Opus 4.6 $5.00 $0.50 (hit) / $6.25 (5m write) 50% off Batch: $2.50 in / $12.50 out
Claude Sonnet 4.6 $3.00 $0.30 (hit) / $3.75 (5m write) 50% off Batch: $1.50 in / $7.50 out
Claude Sonnet 4.5 $3.00 $0.30 (hit) / $3.75 (5m write) 50% off Batch: $1.50 in / $7.50 out
GPT-5.4 $2.50 $0.25 50% off Data residency +10%
GPT-5.4 mini $0.75 $0.075 50% off β€”
GPT-5.4 nano $0.20 $0.02 50% off β€”
GPT-5.3-Codex $1.75 $0.175 50% off β€”
Gemini 3.1 Pro $2.00 $0.20–$0.40 + $4.50/hr storage 50% off Tiered by input length
Gemini 3 Flash $0.50 $0.05 + $1.00/hr storage 50% off β€”
Gemini 3.1 Flash-Lite $0.25 Supported Supported Exact rate not published
DeepSeek-V4 $0.30 $0.03 (90% off) Off-peak 50% off 11PM–7AM Beijing time
DeepSeek-V3.2 $0.28 $0.028 β€” No formal batch API
Qwen3.5-Max $0.40 Available 50% off β€”
GLM-5 / GLM-5.1 $1.00 $0.20 β€” β€”
Grok 4 $3.00 $0.75 β€” β€”
Grok 4 Fast $0.20 $0.05 β€” β€”
Mistral Large 3 $0.50 $0.05 Available β€”
Step-3.5-Flash $0.10 β€” β€” β€”

Speed & Latency

Output throughput and time-to-first-token from Artificial Analysis and provider benchmarks.

Model Output Speed (tok/s) TTFT Notes
Gemini 3.1 Flash-Lite ~363 β€” Fastest frontier model
Step-3.5-Flash 85–350 β€” Variable by provider; peak ~350 tok/s
Gemini 3 Flash ~193 ~4.16s β€”
MiniMax-M2.5 Lightning ~100 β€” Faster tier
GPT-5.3-Codex ~86 ~77.86s High TTFT due to extended reasoning
Grok 4 ~56 ~8.96s β€”
MiniMax-M2.5 Standard ~50 β€” β€”

Most frontier models (Claude Opus/Sonnet 4.6, GPT-5.4, Gemini 3.1 Pro, etc.) have not yet been benchmarked on Artificial Analysis as of April 2026.

Training Data Cutoffs

Knowledge cutoff dates β€” the point after which a model has no training data.

Model Training Cutoff Notes
Claude Sonnet 4.6 Jan 2026 Most recent cutoff among frontier models
Claude Opus 4.6 Aug 2025 Reliable knowledge: May 2025
GPT-5.4 / mini / nano Aug 31, 2025 β€”
GPT-5.3-Codex Aug 31, 2025 β€”
Grok 4 Fast Jul 2025 β€”
DeepSeek-V4 May 2025 β€”
Gemini 3.1 Flash-Lite Jan 2026 β€”
Gemini 3.1 Pro / 3 Pro / 3 Flash Jan 2025 β€”
Grok 4 ~Nov–Dec 2024 Approximate
DeepSeek-V3.2 Jul 2024 β€”
Llama 4 Scout / Maverick Aug 2024 β€”
DeepSeek-R1 ~Oct 2023 Based on base model

Models not listed (Qwen, GLM, MiniMax, Kimi, Step, Mistral): training cutoff not publicly disclosed.

Multilingual Support

Model Languages Details
Qwen3.5-Max 201 Largest language coverage
Llama 4 Scout 200 Pre-training languages
Qwen3-Max-Thinking 119 Qwen3 series
Gemini 3 Flash 100 91.8% MMMLU score across 100 languages
Gemini 3.1 Pro / 3 Pro 100+ β€”
Gemini 3.1 Flash-Lite 100+ 88.9% MMMLU
Llama 4 Maverick 12 Output languages
Claude (all) Many English-optimized; broad multilingual
GPT-5.4 (all) Many Broad multilingual coverage
DeepSeek (all) Many Chinese + English focused
Grok (all) Many β€”
GLM-5 / GLM-5.1 Many 28.5T token training data

Structured Output & Function Calling

All frontier models support structured JSON output and function/tool calling except where noted.

Capability Supported Models Not Supported
Structured Output (JSON mode) All models listed in Frontier table Gemini 3 Deep Think (no API)
Function Calling / Tool Use All models listed in Frontier table Gemini 3 Deep Think (no API)

Gemini 3 Deep Think is available only via Gemini's in-app Think mode β€” no API access for structured output or function calling.

Regional Availability

Provider API Availability Cloud Partners Notes
Anthropic Global AWS Bedrock, GCP Vertex AI US-only inference at 1.1x via inference_geo
OpenAI Global Azure OpenAI Data residency endpoints +10% (post-3/5/26)
Google Global Google AI Studio, Vertex AI Some regional restrictions per Google terms
DeepSeek Global Azure (R1 only, select regions) China-based servers
Alibaba (Qwen) Global Alibaba Cloud Model Studio China-based; globally accessible
Zhipu AI (GLM) Global Z.AI API MIT license enables self-hosting anywhere
MiniMax Global MiniMax API β€”
Moonshot AI (Kimi) Global platform.kimi.ai MIT open-weight
xAI (Grok) US-focused Oracle OCI (East/Midwest/West) Limited non-US availability
Mistral Global Azure AI Foundry, AWS, GCP β€”
Meta (Llama) Global (self-host) All major cloud providers Llama 4 Community License
StepFun Global HuggingFace Apache 2.0 open-source

Open-Source Models πŸ†“

Self-hostable models with permissive licenses or open weights for privacy, cost control, and customization.

Model Company Params Context License
DeepSeek-V4 DeepSeek 671B 1M+ MIT
Qwen3.5-Max Alibaba 1T+ 128K Apache 2.0
Qwen3-Max-Thinking Alibaba 1T+ 128K Apache 2.0
Mistral Large 3 Mistral AI 675B (MoE) 128K Apache 2.0
Llama 4 Scout Meta 109B 10M Community
Llama 4 Maverick Meta 400B 128K Community
GPT-OSS-120B OpenAI 117B 128K Apache 2.0
GPT-OSS-20B OpenAI 21B 128K Apache 2.0
Qwen3-Coder Alibaba 480B 128K Apache 2.0
GLM-4.7 Zhipu AI 400B+ MoE 128K Open Weight
Phi-4 Microsoft 14B 128K MIT
Granite 4.0 IBM 8B-3B 128K Apache 2.0
DeepSeek-Coder-V2 DeepSeek 236B 128K MIT
GLM-5.1 Zhipu AI 744B (40B active MoE) 200K MIT
Step-3.5-Flash StepFun 196B (11B active MoE) 256K Open Weight
Yi-Coder 01.AI 9B/1.5B 128K Apache 2.0

Deployment Options

Local Inference Tools:

  • Ollama - Easy local deployment
  • LM Studio - User-friendly GUI
  • llama.cpp - Efficient CPU inference
  • vLLM - High-throughput serving
  • SGLang - Structured generation

Cloud Deployment:

  • Hugging Face Inference - Managed deployment
  • AWS SageMaker - Full control
  • Google Cloud Vertex - Integrated
  • RunPod - GPU rental

Coding Models πŸ’»

Specialized AI models optimized for software development tasks.

SWE-bench Verified Leaderboard

Rank Model Company SWE-bench Verified
πŸ₯‡ #1 Claude Opus 4.6 Anthropic 80.8%
πŸ₯ˆ #2 Gemini 3.1 Pro Google 80.6%
πŸ₯‰ #3 MiniMax-M2.5 MiniMax 80.2%
#4 GPT-5.4 OpenAI ~80%
#5 GPT-5.2 OpenAI 80.0%
#6 Claude Sonnet 4.6 Anthropic 79.6%
#7 Gemini 3 Flash Google 78.0%
#8 GLM-5 Zhipu AI 77.8%
#9 Claude Sonnet 4.5 Anthropic 77.2%
#10 Kimi K2.5 Moonshot AI 76.8%

Commercial Coding Models

Model Developer Pricing Best For
Claude Opus 4.6 Anthropic $5 / $25 per 1M Agentic coding, complex tasks
GPT-5.3-Codex OpenAI TBD Agentic coding, 7+ hour autonomy
Claude Haiku 4.5 Anthropic $1 / $5 per 1M Low-latency coding, sub-agents, computer use
GLM-5-Code Zhipu AI $1.20 / $5.00 per 1M Code generation, refactoring
MiniMax-M2.5 MiniMax $0.30 / $1.20 per 1M Code generation, refactoring
Claude Sonnet 4.5 Anthropic $3 / $15 per 1M Code review, refactoring
Codestral Mistral AI $0.30 / $0.90 Real-time completion
Grok 4 Fast xAI $0.20 / $1.50 Most used (50% share)

Open-Source Coding Models

Model Developer License Hardware
GPT-OSS-120B OpenAI Apache 2.0 80-160 GB VRAM
Qwen3-Coder Alibaba Apache 2.0 160-320 GB VRAM
DeepSeek-Coder-V2 DeepSeek MIT 48-80 GB VRAM
GLM-4.6 Zhipu AI Open Weight 80-160 GB VRAM
Phi-4 Microsoft MIT 24-48 GB VRAM

Reasoning Models 🧠

Models optimized for step-by-step reasoning, mathematical problem-solving, and complex logical inference.

AIME 2025 Leaderboard

Rank Model AIME 2025 ARC-AGI-2 Notes
πŸ₯‡ #1 Gemini 3.1 Pro 100% 77.1% Highest combined reasoning
πŸ₯ˆ #2 GPT-5.2 100% 52.9% No tools needed
πŸ₯‰ #3 Grok 4 100% β€” First-principles reasoning
#4 Claude Opus 4.6 99.8% 68.8% Near-perfect AIME
#5 Gemini 3 Pro 98–100% 31.1–45.1% With code execution
#6 Step-3.5-Flash 97.3% β€” Best efficiency ratio
#7 Kimi K2.5 96.1% β€” Strong multimodal reasoning
#8 Claude Sonnet 4.6 ~95% 58.3% Near-Opus performance
#9 GLM-5 92.7% β€” Thinking mode
#10 DeepSeek-V3.2 89.3% β€” Budget reasoning

Reasoning Model Details

Model Type Context Pricing
Gemini 3 Deep Think Reasoning 1M+ Ultra subscription
Qwen3-Max-Thinking Reasoning/Coding 128K $1.20 / $6.00
o3 / o1-Pro Reasoning 128K $2-150 / $8-600
Gemini 3 Pro General/Multimodal 1M+ $2 / $12
DeepSeek-R1 Reasoning 128K $0.50 / $2.15
Claude Sonnet 4.5 Hybrid 200K $3 / $15

Use Cases

  • Mathematical Problem Solving: Qwen3-Max-Thinking, GPT-5 Pro, Gemini 3 Pro
  • Scientific Analysis: Claude Opus 4.6, GPT-5, Gemini 3 Pro
  • Strategic Planning: o3/o1-Pro, Claude Sonnet 4.5, DeepSeek-R1
  • Code Debugging: Claude Sonnet 4.5, GPT-5.3-Codex, DeepSeek-V3.1

Multimodal Models 🎨

Models capable of processing and generating multiple types of content: text, images, audio, and video.

Leading Multimodal Models

Model Developer Context Key Features
GPT-5.4 OpenAI 1M Unified multimodal, audio
Gemini 3 Pro Google 1M+ Native multimodal, video
Claude Sonnet 4.5 Anthropic 200K Document understanding
Llama 4 Maverick Meta 128K Open multimodal

Vision Capabilities

Model MMMU / MMMU-Pro MathVista DocVQA
Gemini 3.1 Pro 95% (MMMU-Pro) β€” β€”
GPT-5.4 94% (MMMU-Pro) β€” β€”
Gemini 3 Pro 81% (MMMU-Pro) β€” β€”
Gemini 3 Flash 80% (MMMU-Pro) β€” β€”
Claude Sonnet 4.5 77.8% (MMMU) β€” β€”
Llama 4 Maverick 73.4% (MMMU) β€” β€”

Audio & Video

Model Speech-to-Text Text-to-Speech Video Input
Gemini 3 Pro βœ… βœ… βœ…
GPT-5 βœ… βœ… ⚠️
Whisper v3 βœ… ❌ βœ…

Image Generation

Model Developer License Best For
Flux.1 Black Forest Labs Apache 2.0 High-fidelity art
Stable Diffusion 3.5 Stability AI Community License Fine-tuning
GLM-Image Zhipu AI (Z.ai) API Fast image generation
CogView-4 Zhipu AI (Z.ai) API Creative image generation

Hardware Requirements πŸ–₯️

Comprehensive hardware specifications for self-hosting AI models.

Quick Reference by Model Size

Model Params Q4 Size Min VRAM Rec VRAM Min RAM
Phi-4 14B 8 GB 24 GB 48 GB 32 GB
GPT-OSS-20B 21B 12 GB 24 GB 48 GB 32 GB
Llama 4 Scout 109B 66 GB 48 GB 80 GB 96 GB
GPT-OSS-120B 117B 70 GB 80 GB 160 GB 128 GB
DeepSeek-Coder-V2 236B 143 GB 48 GB 80 GB 192 GB
Llama 4 Maverick 400B 242 GB 160 GB 320 GB 320 GB
DeepSeek-V4 671B 404 GB 80 GB 320 GB 512 GB
Qwen3-Max-Thinking 1T+ 600+ GB 160 GB 640 GB 768 GB

By Hardware Tier

Consumer/Entry Level (24-48 GB VRAM):

  • Phi-4, GPT-OSS-20B, Yi-Coder, Qwen2.5-Coder
  • Recommended GPUs: RTX 3090 (24GB), RTX 4090 (24GB)

Professional (80-160 GB VRAM):

  • Llama 4 Scout, GPT-OSS-120B, DeepSeek-Coder-V2
  • Recommended GPUs: A100 80GB, 2x A100 40GB

Enterprise (320+ GB VRAM):

  • Llama 4 Maverick, GLM-4.7, DeepSeek-V4, Qwen3-Max-Thinking
  • Recommended GPUs: 4x A100 80GB, 8x A100 80GB

Quantization Explained

Level Bits Size vs FP16 Quality Use Case
FP16/BF16 16 100% Best Training
Q8_0 8 ~50% Excellent High-quality inference
Q4_K_M 4 ~25% Good Recommended for deployment
Q3_K_M 3 ~19% Fair Limited resources

Comprehensive Benchmark Reference πŸ“ˆ

Detailed benchmark scores across all major evaluations. Scores are percentages (%) unless noted. Arena Elo scores are integers. β€” = not publicly reported. Data as of April 2026.

Full Benchmark Table

Model GPQA Diamond MMLU-Pro Arena Elo (Text) HLE SWE-bench Verified SWE-bench Pro LiveCodeBench AIME 2025 ARC-AGI-2 MMMU-Pro IFEval FrontierMath
Claude Opus 4.6 91.3% β€” 1500 40.0–53.0% 80.8% β€” β€” 99.8% 68.8% β€” β€” β€”
Claude Sonnet 4.6 89.9% β€” ~1438 33.2–49.0% 79.6% β€” β€” ~95% 58.3% β€” β€” β€”
Claude Sonnet 4.5 83.4% 88.0% β€” β€” 77.2% β€” β€” 87–100% β€” β€” β€” β€”
GPT-5.4 92.0% 94% 1484 36.6–41.6% ~80% 57.7% 84–88% 88% 73.3% 94% β€” 50% (Pro)
GPT-5.4 mini 87.5% β€” β€” β€” β€” 54.4% β€” β€” β€” β€” β€” β€”
GPT-5.3-Codex 91.5% β€” β€” β€” β€” 56.8% 85% β€” β€” β€” β€” β€”
GPT-5.2 92.4% β€” 1479 35.2% 80.0% 55.6% β€” 100% 52.9% β€” 95.6% ~40.3%
Gemini 3.1 Pro 94.3% 92% 1494 44.4–51.4% 80.6% 54.2–72% 71% 100% 77.1% 95% 95% β€”
Gemini 3 Pro 91.9–93.8% 83% 1486 37.5% 76.2% 43.3% 49% 98–100% 31.1–45.1% 81% 88% 38%
Gemini 3 Flash 90.4% 72% 1474 33.7% 78.0% 44% β€” β€” β€” 80% 85% β€”
Gemini 3 Deep Think ~97% 81% β€” 48.4% ~58% 63% 58% β€” 84.6% β€” β€” β€”
DeepSeek-V3.2 87.1% 85.0% β€” 25.1% 67.8% β€” β€” 89.3% β€” β€” β€” β€”
DeepSeek-R1 71.5% 84.0% β€” 8.5% 49.2% β€” 63.5% 70.0% β€” β€” β€” β€”
Qwen3.5-Max 89.3% β€” β€” β€” 76.4% β€” β€” 91.3% β€” 79% β€” β€”
Qwen3-Max-Thinking 86.1% β€” β€” 26.2% β€” β€” β€” β€” β€” β€” β€” β€”
GLM-5 82.0% β€” ~1451 10.4% 77.8% β€” β€” 92.7% β€” β€” β€” β€”
GLM-5.1 β€” β€” β€” β€” ~80.4% (est.) β€” β€” β€” β€” β€” β€” β€”
Kimi K2.5 87.6% 87.1% β€” 31.5–50.2% 76.8% β€” 85.0% 96.1% β€” 78.5% β€” β€”
MiniMax-M2.5 85.2% β€” β€” β€” 80.2% 55.4% β€” 86.3% β€” β€” β€” β€”
Step-3.5-Flash 83.1% β€” β€” β€” 74.4% β€” 86.4% 97.3% β€” β€” β€” β€”
Grok 4 ~91.5% 91.5% ~1493 50.7% β€” β€” β€” 100% β€” β€” β€” β€”
Llama 4 Maverick 69.8% 80.5% β€” β€” β€” β€” 43.4% β€” β€” β€” β€” β€”
Llama 4 Scout 57.2% 74.3% β€” β€” β€” β€” 32.8% β€” β€” β€” β€” β€”

FrontierMath Scores

FrontierMath is a benchmark of 350 original, exceptionally challenging mathematics problems created by expert mathematicians (Epoch AI). Problems span number theory, analysis, algebraic geometry, and category theory. Tier 4 problems can take research mathematicians multiple days.

Model Tiers 1–3 Tier 4 Source
GPT-5.4 Pro 50% ~36–38% Epoch AI
GPT-5.2 Pro ~40.3% 31% Epoch AI
Gemini 3 Pro 38% 19% Epoch AI
GPT-5.1 Thinking ~25% β€” llm-stats

Benchmark Glossary

Benchmark Description Source
GPQA Diamond Graduate-level science questions (PhD difficulty) Google Research
MMLU-Pro Extended multi-task language understanding (harder than MMLU) TIGER-Lab
Arena Elo Crowdsourced human preference ranking lmarena.ai
HLE Humanity's Last Exam β€” expert-level questions Scale AI
SWE-bench Verified Real GitHub issue resolution (human-verified subset) SWE-bench
SWE-bench Pro More challenging subset of SWE-bench SWE-bench
LiveCodeBench Live competitive programming problems (not in training data) LiveCodeBench
AIME 2025 American Invitational Mathematics Examination MAA
ARC-AGI-2 Abstract reasoning challenge (fluid intelligence) ARC Prize
MMMU / MMMU-Pro Multi-discipline multimodal understanding MMMU
IFEval Instruction-following evaluation Google Research
FrontierMath Expert-level research mathematics (Epoch AI) Epoch AI

Development Tools πŸ› οΈ

AI-powered tools for software development, from IDEs and CLI tools to API providers and IDE extensions.

IDEs πŸ’»

Integrated Development Environments with built-in AI capabilities.

Agentic IDEs

IDE Platform Version Release Date Pricing Key Features GitHub
Firebase Studio Web - - Free (3 workspaces, up to 30 with Google Developer Program) Cloud-based, Gemini, MCP πŸ”—
Lingma IDE (ι€šδΉ‰η΅η ) Windows, macOS - - Free (download) Built-in agent, MCP tool use, terminal command execution ❌
Tonkotsu Windows, macOS - - Free (during early access) Team of agents, workflow πŸ”—
OpenCode Windows, macOS, Linux - - Free (OSS) Terminal, desktop, IDE extension, multi-provider πŸ”—
Codex app Windows - 2026-03-04 00:00 UTC Included with Codex plans Multiple agents, isolated worktrees, reviewable diffs, CLI and IDE interop πŸ”—
Visual Studio Windows, macOS 17.14.12+, 18.1.0+ 2026-01-06 00:00 UTC Free / $250/yr Gemini 3 Flash integration, faster performance, zero-migration upgrades, real-time profiler agent ❌
IntelliJ IDEA Windows, macOS, Linux 2025.3.2 2026-01 Free / $149/yr Java 24 support, Kotlin K2 mode, performance and memory improvements ❌

Native AI Editors

Editor Platform Version Release Date Pricing Key Features GitHub
Zed macOS, Windows, Linux 0.226.3 2026-03-03 00:00 UTC Free (OSS) + Copilot $10/mo Fast, collaboration, Gemini and Claude, Zeta AI, agent thread history, edit prediction providers, self-hosted OpenAI-compatible servers πŸ”—
Dyad Windows, macOS, Linux - - Free (OSS) Local generation, BYO keys πŸ”—
Memex macOS, Windows - - Freemium (Free + $10/mo) Agentic, browser↔desktop πŸ”—

VS Code Forks

IDE Platform Version Release Date Pricing Autonomous MCP GitHub
Cursor Windows, macOS, Linux 0.46+ 2026-02-12 00:00 UTC Freemium (Free + Pro $19/mo or $39/mo) βœ… ❌ ❌
Windsurf Windows, macOS, Linux 1.9552+ 2026-02-12 00:00 UTC Freemium (Free + Pro) βœ… βœ… ❌
Trae macOS, Windows - - Free ❌ ❌ πŸ”—
PearAI Windows, macOS, Linux - - Free (OSS) βœ… ❌ πŸ”—
Void Windows, macOS, Linux - - Free (OSS) βœ… βœ… πŸ”—
Kiro Windows, macOS, Linux - - Free (Preview) βœ… βœ… πŸ”—

Web-Based IDEs

IDE Platform Version Release Date Pricing Self-Hostable Best For GitHub
Replit 3 Web - - Free Starter, Core $20/mo, Pro $100/mo ❌ Learning/Prototyping ❌
Bolt.new Web - - Free, Pro $20-25/mo, Teams $30/user/mo ❌ Quick apps ❌
Bolt.diy Self-hosted - - Free (MIT), bring your own API βœ… Self-hosted πŸ”—
Lovable Web - - Free (5 credits/day), Pro $25/mo, Business $50/mo ❌ UI/Full-stack ❌
v0 Web - - Free ($5 credits/mo), Premium $20/mo, Teams $30/user ❌ React components ❌
Gitpod Web - - Free + Paid ❌ Cloud dev environments ❌
Rork Web - - Free & Paid (credits) ❌ Mobile apps (iOS/Android) ❌
Google Antigravity Web - - Google AI Pro / Ultra Agent-first development with Gemini-powered coding ❌
Jules Web - 2025-05-20 00:00 UTC Free beta, higher limits on Google AI Pro / Ultra Async repo agent, reviewable diffs, GitHub integration ❌

CLI Tools πŸ–₯️

Command-line AI tools for autonomous coding and terminal enhancement.

Autonomous Coding Agents

Tool Platform Pricing Key Features GitHub
Aider Windows, macOS, Linux Free Gold standard, Architect mode, thinking tokens πŸ”—
Claude Code 2.1+ macOS, Linux, Windows Free + API Fast mode for Opus 4.6, simple mode file editing, Unicode fix πŸ”—
Codex CLI Windows, macOS, Linux Included Sandbox, approval modes πŸ”—
Goose Windows, macOS, Linux Free (Apache-2.0) MCP, extensible, desktop app, 25+ providers πŸ”—
GPT-Pilot Windows, macOS, Linux Free Full dev team simulation πŸ”—
OpenHands Windows, macOS, Linux Free Cloud agents, MCP πŸ”—
Mentat Windows, macOS, Linux Free Multi-file coordination πŸ”—

Assisted CLI Tools

Tool Developer Pricing Best For
Gemini CLI Google Free Google ecosystem
Cursor CLI Cursor Free tier Terminal + IDE bridge
Qwen Code Alibaba Free Qwen optimization
Qodo CLI Qodo Free tier Testing and review

Terminal Enhancers

Tool Platform Pricing Key Features
Warp Terminal macOS, Linux, Windows Free AI Agents, workflow sharing
Fig macOS, Linux Free Autocomplete, AI suggestions

IDE Add-ons 🧩

Extensions and plugins that add AI capabilities to existing IDEs.

Universal (Cross-Platform)

Add-on Platform Pricing Context Best For GitHub
GitHub Copilot VS Code, JetBrains, Vim Free / $10/mo / $39/mo Large General coding ❌
Supermaven VS Code, JetBrains, Neovim Free / $10/mo 1M Large codebases ❌
Codeium VS Code, JetBrains, Vim Free / $15/mo / $60/mo Medium Free alternative ❌
Continue VS Code, JetBrains Free (OSS) Custom Self-hosted πŸ”—
Cody VS Code, JetBrains, Web Free (discontinued) / Enterprise Starter $19/mo / Enterprise $59/mo Enterprise Code search πŸ”—
Tabnine VS Code, JetBrains, VS, Eclipse Free / $39/mo Local Privacy ❌

VS Code Specific

Add-on Pricing Autonomous MCP Best For GitHub
Codex Free (with ChatGPT Plus $20/mo or Pro $200/mo) βœ… βœ… OpenAI's official coding agent πŸ”—
Cline Free βœ… βœ… Full agent πŸ”—
GitHub Copilot (Agent Mode) $0 / $10 / $39/mo ⚠️ ❌ Guided agent workflows ❌
RooCode Free/Pro ⚠️ ❌ Complex tasks πŸ”—
Keploy OSS/Enterprise ❌ ❌ Testing ❌

JetBrains Specific

Add-on Pricing Claude Agent Best For
JetBrains AI Assistant $10/mo (Pro), $249/yr (Ultimate) βœ… Deep IDE integration
JetBrains Claude Agent Included in subscription βœ… Native agent

API Providers πŸ”Œ

Services for accessing AI models via API.

Model Labs (Direct)

Provider Models Pricing
OpenAI GPT-5.4, GPT-5.4 mini, GPT-5.4 nano, o3, Codex Pay-per-token
Anthropic Claude Opus 4.6, Sonnet 4.6, Haiku 4.5 Pay-per-token
Google AI Studio Gemini 3.1 Pro, Gemini 3.1 Flash-Lite, Gemini 3 Flash Free / Pay
Z.ai (Zhipu AI) GLM-5, GLM-5-Code, GLM-4.7 Pay-per-token
MiniMax MiniMax-M2.5/M2.1/M2 Pay-per-token
Cohere Command, Embed, Rerank Pay-per-token
AI21 Labs Jamba Pay-per-token
Perplexity Sonar / Sonar Pro / Sonar Reasoning Pro Pay-per-token + request fees
Moonshot AI Kimi (kimi-k2.5, kimi-k2-thinking) Pay-per-token
ByteDance (Volcengine) Doubao Pay-per-token
Tencent (Hunyuan) Hunyuan Pay-per-token
Baidu (ERNIE) ERNIE Pay-per-token
DeepSeek DeepSeek-V4/R1 Pay-per-token
Mistral AI Mistral Large 3 Pay-per-token
xAI Grok-4 Pay-per-token

Unified APIs & Aggregators

Provider Models Key Features
OpenRouter 200+ Crypto/fiat, rankings
Hugging Face Thousands Serverless inference

Inference Clouds

Provider Specialization Speed
Together AI Llama/Qwen/Mistral Fast
Fireworks AI FireAttention Low-latency
Groq LPU >500 T/s
Cerebras Wafer-Scale >2000 T/s

GPU Clouds

Provider Type Best For
RunPod GPU Rental Flexibility
Replicate Model-as-a-Service Quick deployment
Vultr Global Cloud Hourly
Hyperbolic Decentralized Crypto/Fiat

Automation πŸ€–

AI-powered tools for automating browser and desktop tasks.

Browser Automation 🌐

Tools and frameworks for AI-powered browser automation.

Standalone AI Browsers

Browser Pricing Open Source Local AI Best For GitHub
Sigma AI Browser Freemium (Free + Pro $9.99/mo) ❌ βœ… Offline AI, agentic browsing ❌
ChatGPT Atlas Free (with ChatGPT subscription) ❌ ❌ OpenAI integration, macOS πŸ”—
Genspark Freemium (Free + Plus $19.99/mo + Pro $249/mo) ❌ ❌ AI Super Agent, research ❌
BrowserOS Free βœ… βœ… Privacy-focused πŸ”—
Brave Leo Freemium (Free + Premium) ❌ ⚠️ (Experimental) Privacy-focused AI ❌
Fellou Freemium (Free for 4 tasks, $19/mo Pro) ❌ ❌ True agentic browser πŸ”—
Perplexity Comet Free (with Pro $20/mo) or $5/mo Standalone ❌ ❌ Research ❌
Dia Freemium (Free limited, $20/mo Pro, Dia+ $50/mo) ❌ ❌ Arc replacement ❌
Opera Neon $19.90/mo ❌ ❌ Agentic browsing ❌
Opera One (Aria) Free ❌ ❌ Built-in AI assistant ❌
Edge Copilot Free (Copilot Pro $20/mo) ❌ ❌ Enterprise AI browser ❌
BrowserGPT Freemium (Free + Premium) ❌ ❌ Mobile-first AI browser (iOS/Android) ❌
Arc Max Free ❌ ❌ AI-enhanced browsing, macOS ❌
AnythingLLM Free (OSS) βœ… βœ… All-in-one desktop AI, document chat, local models πŸ”—

Browser Extensions

Extension Pricing Free Multi-Agent Best For GitHub
Harpa AI Free βœ… ❌ Automation recipes πŸ”—
MultiOn Free/Paid ⚠️ βœ… Complex tasks πŸ”—
NanoBrowser Free βœ… βœ… Local control πŸ”—
Neobrowser Free (OSS) βœ… ❌ Local LLMs via Ollama, privacy-first, Chrome/Edge ❌

Developer Libraries

Library Language Best For GitHub
Browser-use Python Agentic automation πŸ”—
Stagehand TypeScript Web apps πŸ”—
LaVague Python NL to code πŸ”—
Skyvern Python CV-based automation πŸ”—
Firecrawl Python / CLI LLM-powered crawling & scraping with prompt chaining πŸ”—

Cloud Automation

Service Platform Pricing Best For GitHub
ChatGPT agent ChatGPT Plus / Pro / Team Guided browser tasks, research, forms, and spreadsheets ❌
Project Mariner Google AI Ultra Included with Google AI Ultra Multi-step browser tasks, shopping, and reservations ❌
Skyvern Cloud Cloud API Paid Resilient automation πŸ”—
Browserbase Cloud API Paid Stealth mode, session recording ❌

Agent Platforms 🧩

Platforms and runtimes for running or connecting AI agents.

Project Type Pricing Self-Hostable Best For Official
OpenClaw Personal AI assistant Free (OSS) βœ… Always-on assistant across chat channels πŸ”—
NanoClaw Lightweight agent framework Free (OSS) βœ… Containerized agents for WhatsApp, Telegram, Slack, Discord πŸ”—
CrewAI Multi-agent orchestration Free (OSS) / Enterprise βœ… Team-based AI agent workflows πŸ”—
AutoGen Multi-agent framework Free (OSS) βœ… Conversational agent collaboration πŸ”—
LangGraph Agent framework Free (OSS) / LangSmith paid βœ… Stateful, cyclic agent workflows πŸ”—
Dify LLM app platform Free (OSS) / Cloud plans βœ… Visual workflow builder, RAG, agents πŸ”—
n8n Workflow automation Free (OSS) / Cloud from $20/mo βœ… No-code automation with AI agent nodes πŸ”—
Flowise LLM orchestration Free (OSS) βœ… Drag-and-drop LLM flow builder πŸ”—
Lindy AI agent builder Freemium (Free + Pro $49/mo) ❌ No-code AI agents for business tasks πŸ”—
Relevance AI Agent platform Freemium (Free + Paid plans) ❌ Build and deploy AI agents, no-code πŸ”—
Moltbook Agent social network Free ❌ Discovering and pairing with AI agents πŸ”—
ZeroClaw Privacy-first agent runtime (Rust) Free (OSS) βœ… Deploy anywhere, swap any LLM, zero external API calls πŸ”—
NullClaw Sandboxed agent runtime (Zig) Free (OSS) βœ… Ultra-fast, minimal footprint, sandboxed agent tasks πŸ”—
Moltis Rust-native single-binary agent Free (OSS) βœ… Sandboxed, auditable, voice + memory + MCP tools built-in πŸ”—
Hermes Agent Adaptive agent framework (Nous Research) Free (OSS) βœ… Memory management, skills, UI dashboard, grows with you πŸ”—
PicoClaw Ultra-lightweight agent (Go) Free (OSS) βœ… Tiny, fast, embedded/IoT deployments, single-binary πŸ”—
AutoGPT Autonomous agent Free (OSS) βœ… Self-prompting GPT agent with memory, pioneer project πŸ”—
BabyAGI Task-driven agent Free (OSS) βœ… Autonomous task creation and prioritization πŸ”—
Suna Generalist agent Free (OSS) βœ… Versatile open-source agent for complex tasks (Kortix) πŸ”—
OWL Multi-agent framework Free (OSS) βœ… Distributed task automation (Camel-AI) πŸ”—
CogAgent Vision GUI model Free (Research) βœ… High-performance vision-based GUI understanding (Tsinghua/Zhipu) πŸ”—
HyperAgent Code agent Free (OSS) βœ… GitHub issue resolution, repository-level code generation πŸ”—

Cloud Agent Services

Managed cloud services for building and deploying AI agents at scale.

Service Provider Pricing Best For
Google Vertex AI Agent Builder Google Cloud Pay-per-use Enterprise agents grounded in Google Search and data stores
Amazon Bedrock Agents AWS Pay-per-use Serverless agents with knowledge bases and guardrails
Azure AI Agent Service Microsoft Azure Pay-per-use Enterprise agents with Azure AI Search and OpenAI integration

Desktop Automation πŸ–₯️

AI agents and tools for automating desktop tasks and OS-level interactions.

AI Agents (Computer Use)

Local Computer Use Agents

Agents that run directly on your machine and interact with the OS, screen, keyboard, and mouse.

Agent Windows macOS Linux Vision Best For GitHub
Agent S βœ… βœ… βœ… βœ… Research/SOTA, GUI grounding πŸ”—
Simular Agent S2 βœ… βœ… βœ… βœ… Latest SOTA, improved grounding πŸ”—
Open Interpreter βœ… βœ… βœ… ⚠️ Natural language computer control, 63K+ stars πŸ”—
Open-Interface βœ… βœ… βœ… βœ… General-purpose desktop automation πŸ”—
UFO βœ… ❌ ❌ βœ… Windows-specific app automation πŸ”—
Bytebot ❌ ❌ βœ… βœ… Self-hosted (Docker), headless πŸ”—
Microsoft Fara-7B βœ… βœ… βœ… βœ… Open-weight vision grounding model πŸ”—
UI-TARS βœ… βœ… βœ… βœ… Autonomous GUI execution, vision-language-action model (ByteDance) πŸ”—
c/ua βœ… βœ… βœ… βœ… Isolated VM environments, open-source CU infrastructure πŸ”—
Windows-Use βœ… ❌ ❌ βœ… Windows OS-specific agent automation πŸ”—
OpenCUA βœ… βœ… βœ… βœ… Open foundations for computer-use agents πŸ”—
Devin βœ… βœ… βœ… βœ… Full-stack software engineering agent (Cognition Labs) ❌
Ace βœ… βœ… βœ… βœ… 20x human speed on UI tasks (General Agents) ❌
Cloud / API Computer Use Agents

Agents accessed via API or cloud service β€” OS-independent, but require internet connectivity.

Agent Interface Vision Best For GitHub
Anthropic Computer Use API βœ… Beta capability, Claude-powered desktop control ❌
OpenAI Operator API βœ… Guided browser and desktop computer use ❌
Amazon Nova Act API βœ… AWS browser automation SDK ❌
Manus AI Cloud βœ… General-purpose cloud agent ❌
Adept AI (ACT-1) API βœ… Pioneer in digital actions, self-correcting behavior ❌
AskUI Vision Agent API βœ… Cross-platform vision automation without VMs ❌
Highlight AI Desktop + Cloud βœ… Privacy-first desktop Q&A and automation ❌
AI Operating Systems

AI-native operating systems and platforms that embed LLMs as core system components.

OS / Platform Type Hardware Local/Cloud Best For GitHub
AIOS Open Source (MIT) Any Both Kernel-level LLM agent OS, agent scheduling & memory management πŸ”—
Ghost OS Open Source Any Local Autonomous agent workflows πŸ”—
computer_use_ootb Open Source Any Local/API Out-of-the-box GUI automation (Claude 3.5 CU + local models) πŸ”—
Rabbit OS (R1) Commercial ($199 device) R1 Device Cloud Consumer AI assistant, LAM-based app automation ❌
Apple Intelligence Commercial (OS-level) Apple Silicon (M1+) On-device / Private Cloud Privacy-first, system-wide writing, Siri, image generation ❌
Windows Copilot+ Commercial (OS-level) NPU (40+ TOPS) Hybrid Recall, Cocreator, live captions, enterprise productivity ❌

RPA & Visual Frameworks

Tool Platform Best For
Ui.Vision RPA Windows, macOS, Linux Visual automation
OmniParser V2 Cross-platform Screen parsing

Scripting Libraries

Tool Platform Key Features GitHub
PyAutoGUI Cross-platform Simple API, fail-safe πŸ”—
Nut.js Cross-platform Visual search, image matching ❌
OpenAdapt Windows, macOS Learning from demonstration πŸ”—

Research Projects (Computer Use)

Notable academic and industry research advancing the field of computer-use agents.

Project Developer Focus Year Paper
Gato Google DeepMind Multi-modal, multi-task, multi-embodiment agent 2022 DeepMind
PaLM-E Google DeepMind Embodied multimodal language model 2023 arXiv
RT-2 Google DeepMind Vision-language-action model for robotics 2023 arXiv
HuggingGPT (Jarvis) Microsoft Orchestrates specialists for multi-modal tasks 2023 arXiv
SIMA Google DeepMind Generalist AI agent for 3D virtual environments 2024 DeepMind
Magma Microsoft Research Vision-language-action foundation model 2025 arXiv
WebAgent Google DeepMind Autonomous web browsing and form-filling 2024 arXiv
WebVoyager Hongliang He et al. Autonomous web browsing (59.1% on 15-website benchmark) 2024 arXiv

AI Infrastructure πŸ—οΈ

Tools, frameworks, and specialized models for building production AI systems β€” from embeddings and video generation to safety, evaluation, and model routing.

Embedding & Reranking Models 🧲

Specialized models for converting text (or images) into dense vector representations and for reranking retrieval results. Essential infrastructure for RAG pipelines and semantic search. Prices as of April 2026.

Embedding Models

Model Developer Dimensions Max Tokens Pricing Best For GitHub
text-embedding-3-small OpenAI 1,536 8,191 $0.02/1M tokens Cost-effective English embeddings β€”
text-embedding-3-large OpenAI 3,072 8,191 $0.13/1M tokens Highest-quality English retrieval β€”
Embed v4 Cohere 1,536 128K $0.12/1M (text), $0.47/1M (image) Multimodal text + image RAG β€”
voyage-3-large Voyage AI 256–2,048 (flex) 32K ~$0.18/1M tokens Highest-quality retrieval, long context β€”
jina-embeddings-v3 Jina AI 32–1,024 (flex) 8,192 API pay-per-use Multilingual, task-adaptive (LoRA heads) πŸ”—
BGE-M3 BAAI 1,024 8,192 Free (open-source) Multi-functional: dense + sparse + ColBERT πŸ”—
Nomic Embed v2 (MoE) Nomic AI 256–768 (flex) 512 Free (open-source) Multilingual, MoE efficiency (305M active) πŸ”—
text-embedding-005 Google (Vertex AI) 768 2,048 $0.10/1M tokens GCP-native semantic search β€”

Reranking Models

Model Developer Max Tokens Pricing Best For GitHub
Rerank 4.0 Pro Cohere 32K $1.00/1K queries High-accuracy domain-specific reranking β€”
Rerank 4.0 Fast Cohere 32K $0.50/1K queries Low-latency production reranking β€”
rerank-2.5 Voyage AI 32K API pay-per-use Instruction-following, multilingual β€”
BGE Reranker v2-m3 BAAI 8,192 Free (open-source) Open-source cross-encoder reranking πŸ”—
Jina Reranker v2 Jina AI 8,192 API pay-per-use Multilingual, long-context reranking β€”

Video Generation Models 🎬

Text-to-video and image-to-video generation models for creating short clips from prompts. The field is moving rapidly β€” resolutions, durations, and pricing change frequently. Specs as of April 2026.

Model Developer Resolution Duration Pricing Open Source Best For GitHub
Sora 2 OpenAI Up to 1080p Up to 20s (Pro) $20–$200/mo via ChatGPT No Cinematic quality, long clips β€”
Veo 3 Google DeepMind 720p–1080p Up to 8s (extendable) ~$0.20–$0.40/s No Native audio + video, realistic physics β€”
Runway Gen-4 / Gen-4.5 Runway Up to 4K Up to 16s $12–$76/mo No Professional creative workflows β€”
Kling 2.0 Kuaishou 1080p Up to 10s Free / $5.99–$66/mo No Budget production, fast turnaround β€”
Pika 2.0 Pika Labs 1080p Up to 5s Free / $8–$58/mo No Social media, creative effects β€”
MiniMax Video-01 MiniMax 720p Up to 6s ~$0.40/video No Strong text-motion responsiveness β€”
HunyuanVideo Tencent 720p–2K Up to 16s Free (self-host; ~60GB VRAM) Yes (Apache 2.0) High per-frame fidelity, long clips πŸ”—
Wan 2.2 (14B) Alibaba 480p–1080p Up to 10s ~$0.10–$0.30/clip (API) Yes (Apache 2.0) Motion quality, VBench #1 benchmark πŸ”—
Mochi 1 Genmo 480p Up to 5.4s @ 30fps Free (open-source) Yes (Apache 2.0) High-quality open text-to-video πŸ”—
LTX Video Lightricks 720p Variable Free (open-source) Yes Fast generation, ComfyUI-native πŸ”—
CogVideoX Zhipu AI / Tsinghua 720p ~6s Free (open-source) Yes (Apache 2.0) Image-to-video quality, LoRA fine-tuning πŸ”—

Speech & TTS Models πŸ”Š

Text-to-speech (TTS) and speech-to-text (STT / ASR) models for voice generation, transcription, and real-time audio. Prices as of April 2026.

Text-to-Speech (TTS)

Model Developer Languages Real-time Open Source Pricing Best For GitHub
ElevenLabs Turbo v2.5 ElevenLabs 29+ Yes No Free – $1,320/mo Best quality (4.8 MOS), instant voice cloning β€”
OpenAI TTS / TTS HD OpenAI 57 Yes No $15 / $30 per 1M chars Enterprise, seamless GPT integration β€”
Sesame CSM Sesame AI Labs English Yes Yes Free Conversational, emotionally expressive (4.7 MOS) πŸ”—
Kokoro-82M Hexgrad Multilingual Yes Yes (Apache 2.0) Free Tiny (82M params), CPU-runnable, near-commercial quality πŸ”—
Fish Audio S1 Fish Audio Multilingual Yes Yes Free / $0.016/1K chars (API) Voice cloning, multilingual fluency πŸ”—
Parler-TTS HuggingFace English No Yes (Apache 2.0) Free Style-controllable via text descriptions πŸ”—
XTTS v2 Coqui AI 17 Yes Yes (MPL 2.0) Free Best open-source multilingual, 6s voice cloning πŸ”—
Bark Suno AI 13+ No Yes (MIT) Free Expressive, non-verbal sounds, long-form audio πŸ”—

Speech-to-Text (STT / ASR)

Model Developer Languages Real-time Open Source Pricing Best For GitHub
Whisper large-v3 OpenAI 100+ No Yes (MIT) $0.006/min (API) Open-source multilingual baseline πŸ”—
GPT-4o Transcribe OpenAI 50+ Yes No $0.006/min High-accuracy managed STT β€”
Deepgram Nova-3 Deepgram 36+ Yes No $0.0043/min Ultra-low latency, production STT β€”
AssemblyAI Universal-2 AssemblyAI Multilingual Yes No $0.0025/min Accurate, feature-rich transcription β€”

AI Safety & Guardrails πŸ›‘οΈ

Tools and frameworks for detecting unsafe content, preventing prompt injection, validating outputs, and enforcing policy compliance in LLM-powered applications. As of April 2026.

Tool Developer Type Open Source Pricing Best For GitHub
Llama Guard 3 Meta Safety classifier (8B LLM) Yes (Meta license) Free / ~$0.02/1M tokens (API) Input/output safety classification, 8 languages πŸ”—
NeMo Guardrails NVIDIA Programmable guardrail toolkit (Colang DSL) Yes (Apache 2.0) Free Dialog safety, policy enforcement, LangChain-native πŸ”—
Guardrails AI Guardrails AI Python validator framework Yes Free (OSS) Output validation, PII detection, hallucination guards πŸ”—
Amazon Bedrock Guardrails AWS Managed safety layer No Pay-per-use (AWS) AWS-native, zero-ops compliance and content filtering β€”
ShieldGemma 2 Google Safety classifier (open weights) Yes (open weights) Free Text safety (2B/9B/27B), image safety (4B) β€”
Rebuff Protect AI Prompt injection detector Yes Free Self-hardening anti-injection using vector memory πŸ”—
Lakera Guard Lakera Managed LLM security API No Free tier + Enterprise Runtime LLM security, <50ms latency, PII + injection β€”

RAG Frameworks πŸ—‚οΈ

Frameworks and libraries for building Retrieval-Augmented Generation (RAG) pipelines β€” connecting LLMs to external knowledge sources. As of April 2026.

Framework Developer Language Key Features Open Source GitHub
LlamaIndex LlamaIndex Python 160+ data connectors, hybrid search, multi-agent support Yes (MIT) πŸ”—
LangChain LangChain AI Python / JS Chains, agents, memory, 50K+ integrations, LangGraph Yes (MIT) πŸ”—
RAGFlow InfiniFlow Python Visual workflow builder, deep document parsing (PDF/tables) Yes (Apache 2.0) πŸ”—
Haystack deepset Python Modular pipelines, enterprise-grade, built-in monitoring Yes (Apache 2.0) πŸ”—
Verba Weaviate Python No-code UI, Weaviate-native vector search Yes πŸ”—
Mem0 Mem0 AI Python / JS Persistent memory layer, graph memory, session recall Yes (Apache 2.0) πŸ”—
txtai NeuML Python All-in-one semantic search + workflow automation Yes (Apache 2.0) πŸ”—
R2R SciPhi Python Lightweight, low-latency, REST API, production-first Yes (MIT) πŸ”—

Fine-tuning Platforms βš™οΈ

Tools and platforms for adapting pre-trained LLMs to specific tasks or domains via supervised fine-tuning, RLHF, LoRA/QLoRA, and related methods. Prices as of April 2026.

Platform Type Supported Models Pricing Best For GitHub
Unsloth OSS library Llama, Mistral, Gemma, Qwen, Phi, + more Free 2–5Γ— faster training, 80% VRAM reduction via custom kernels πŸ”—
Axolotl OSS framework Most Hugging Face models Free Config-as-code (YAML), reproducibility, multi-GPU training πŸ”—
OpenAI Fine-tuning Managed API GPT-4o, GPT-4o-mini, GPT-3.5 Turbo GPT-4o-mini: $0.30/1M training tokens Managed, no infra, direct production deployment β€”
Google Vertex AI Managed cloud Gemini 2.5 Pro/Flash, Gemma 3 Gemini 2.5 Pro: $25/1M training tokens GCP-native, Gemini model access β€”
Predibase / LoRAX Cloud + OSS server Llama, Mistral, 50+ HF models Free tier + per-GPU pricing Multi-adapter serving: many LoRA adapters on one GPU πŸ”—
PEFT Hugging Face All Hugging Face models Free LoRA, QLoRA, prefix tuning, prompt tuning β€” full HF ecosystem πŸ”—
LLaMA-Factory Community 100+ models Free Web UI, low-code interface, beginner-friendly fine-tuning πŸ”—
torchtune PyTorch Llama, Gemma, Mistral, Phi Free PyTorch-native, composable training recipes πŸ”—

Evaluation & Observability πŸ“Š

Tools for tracing LLM calls, evaluating output quality, debugging RAG pipelines, and monitoring production AI systems. Prices as of April 2026.

Tool Developer Type Open Source Pricing Best For GitHub
LangSmith LangChain AI Tracing + evaluation platform No (enterprise self-host) Free (5K traces/mo), paid plans LangChain apps, chain + agent debugging β€”
Braintrust Braintrust Data Eval-first platform Partial (AI proxy OSS) Free (1M spans), enterprise CI/CD evals, dataset management, LLM-as-judge β€”
Helicone Helicone Proxy-based observability Yes Free tier, usage-based Cost tracking, request caching, drop-in API proxy πŸ”—
Arize Phoenix Arize AI OSS tracing + evaluation Yes Free (OSS); Arize Cloud paid RAG debugging, LLM-as-judge, local dev πŸ”—
Langfuse Langfuse Tracing + evaluation Yes (MIT) Free / self-host; cloud paid Open-source, 19K+ GitHub stars, OpenTelemetry πŸ”—
Ragas Ragas RAG evaluation framework Yes Free RAG-specific metrics: faithfulness, recall, precision πŸ”—
DeepEval Confident AI LLM evaluation framework Yes Free (OSS); cloud paid 14+ built-in metrics, pytest-style eval runner πŸ”—

MCP Ecosystem πŸ”Œ

The Model Context Protocol (MCP) is an open standard by Anthropic for connecting LLMs to external tools and data sources via a unified JSON-RPC 2.0 interface. It supports STDIO and Streamable HTTP transports. The official MCP registry at mcp.so lists 2,000+ servers.

MCP Clients: Claude Desktop, Claude Code, Cursor, Windsurf, VS Code (Copilot), Continue.dev, Zed, LibreChat, and more.

Popular MCP Servers

Tool / Server Developer Category Open Source Best For GitHub
MCP Filesystem Anthropic / Community File I/O Yes (MIT) Read/write local files from any MCP client πŸ”—
MCP GitHub GitHub / Anthropic Code & DevOps Yes Repo management, issues, PRs, code search πŸ”—
MCP Slack Community Messaging Yes Slack workspace read/write interaction πŸ”—
MCP PostgreSQL Community Database Yes Read-only SQL queries against Postgres πŸ”—
MCP Google Drive Community Storage Yes Drive file access and search πŸ”—
MCP Docker Community DevOps Yes Container management and inspection πŸ”—
MCP Brave Search Brave Search Yes Web + local search via Brave API πŸ”—
MCP AWS AWS Labs Cloud Yes (Apache 2.0) AWS service integration πŸ”—
MCP Notion Community Productivity Yes Notion page and database access πŸ”—
FastMCP Community Framework Yes Python framework for building MCP servers fast πŸ”—
Context7 Upstash Dev Tools Yes Up-to-date library docs for AI coding assistants πŸ”—

Model Routers & Load Balancers πŸ”€

Tools for routing LLM requests across multiple providers, models, and deployments β€” optimizing for cost, latency, quality, or reliability. Prices as of April 2026.

Tool Developer Key Features Open Source Pricing GitHub
LiteLLM BerriAI 100+ provider support, proxy server, load balancing, fallbacks, spend tracking Yes (MIT) Free (OSS) / $99/mo cloud πŸ”—
Portkey Portkey 250+ LLMs, AI gateway, guardrails, observability, virtual keys Yes (Apache 2.0) Free tier / $49/mo+ πŸ”—
OpenRouter OpenRouter 200+ model catalog, unified API, pay-per-use credit system No ~5% markup on provider cost β€”
RouteLLM LMSys Open-source router (strong vs. weak model) using classifier or matrix factorization Yes Free πŸ”—
Not Diamond Not Diamond Pre-trained + custom task-specific routers, cost/quality tradeoff No Free tier + enterprise β€”
Unify AI Unify Quality / cost / latency-aware routing across 100+ model deployments No Usage-based β€”
Semantic Router Aurelio AI Embedding-based semantic intent routing for agents and pipelines Yes Free πŸ”—

Small Language Models (SLMs) πŸ“±

Compact models designed for on-device inference, edge deployment, low-latency APIs, and resource-constrained environments. Generally defined as models under ~15B parameters. Specs as of April 2026.

Model Developer Params Context License Best For
Phi-4 Microsoft 14B 16K MIT Reasoning, math, code β€” STEM benchmark leader at class size
Phi-4-mini Microsoft 3.8B 128K MIT On-device STEM reasoning with long context
Phi-4-multimodal Microsoft 5.6B 128K MIT Vision + audio + text multimodal, edge deployment
Gemma 3 27B Google 27B 128K Apache 2.0 Top open model, multilingual (140+ languages)
Gemma 3 4B Google 4B 128K Apache 2.0 CPU inference, 140+ languages, mobile-friendly
Gemma 3 1B Google 1B 32K Apache 2.0 On-device, embedded, ultra-lightweight
SmolLM3 Hugging Face 3B 128K Apache 2.0 Efficient, tool use, multilingual, reasoning
Qwen2.5 3B Alibaba 3B 128K Apache 2.0 Asian and multilingual tasks, coding
Qwen2.5 7B Alibaba 7B 128K Apache 2.0 Strong multilingual baseline, function calling
Llama 3.2 3B Meta 3B 128K Llama 3.2 license General-purpose, on-device, Meta ecosystem
Llama 3.2 1B Meta 1B 128K Llama 3.2 license Lightweight edge inference, distillation target
Granite 3.3 8B IBM 8B 128K Apache 2.0 Enterprise tasks, tool use, business-domain
MiniCPM 3.0 ModelBest / Tsinghua 4B 32K Apache 2.0 Compact yet capable, mobile and edge
Danube 3 500M H2O.ai 500M 8K Apache 2.0 Ultra-lightweight on-device, IoT

Notable GitHub repos:


Guides πŸ“š

Tutorials, how-tos, and in-depth guides for getting the most out of AI models and tools.

Getting Started πŸš€

A beginner-friendly introduction to AI models and how to start using them effectively.

Understanding LLMs

Concept Description
Parameters Size of model (B = billions). More = more capable
Context Window How much text model can process (128K standard)
Tokens Basic units of text (~0.75 words per token)

Accessing AI Models

Method Best For Setup Difficulty
Web Interfaces Quick experiments Easiest
API Access Building applications Easy
Self-Hosting Privacy, no API costs Medium-Hard
IDE Integration Daily coding Easy

Model Recommendations by Task

Task Free Option Premium Option
Chat Llama 4 (self-hosted) GPT-5.4, Claude Opus 4.6
Coding DeepSeek-Coder-V2 Claude Opus 4.6
Reasoning DeepSeek-R1 Gemini 3 Deep Think, o3
Long docs Llama 4 Scout Gemini 3 Flash
Vision Llama 4 Maverick GPT-5.4, Gemini 3 Pro

Model Selection Guide 🎯

A comprehensive guide to choosing the right AI model for your specific needs.

Quick Decision Tree

Need πŸ†“ Free / Self-Host πŸ’Ž Best Quality ⚑ Fast / Autonomous
πŸ’» Coding DeepSeek-Coder-V2 Claude Opus 4.6 GPT-5.3-Codex
🧠 Reasoning / Math DeepSeek-R1 Gemini 3 Deep Think o3
πŸ’¬ General Chat Llama 4 (self-hosted) GPT-5.4, Claude Opus 4.6 Gemini 3 Flash
🎨 Vision Llama 4 Maverick GPT-5.4, Gemini 3.1 Pro Gemini 3 Flash
πŸ–₯️ Self-Hosting Phi-4 DeepSeek-V4 vLLM / SGLang (serving)

By Budget

Budget Options
Free Self-hosted (Llama 4, Qwen3, Mistral)
$0-10/mo API entry tiers, Gemini Flash
$10-50/mo Copilot, Claude API, GPT-5 API
$50+/mo Heavy usage, multiple models

Self-Hosting Guide πŸ–₯️

A comprehensive guide to running AI models on your own hardware.

Benefits

Benefit Description
Privacy Data never leaves your infrastructure
Cost Control No per-token API costs for unlimited usage
Customization Fine-tune models for specific needs
No Rate Limits Process as much as hardware allows
Offline Access Work without internet

Quick Start with Ollama

For installation and usage instructions, refer to the official Ollama documentation.

Local GPU Quick Guide

Recommended apps (local-first):

  • Ollama - Simple local runtime with a local HTTP API
  • LM Studio - Desktop UI for downloading and running models locally
  • llama.cpp - Fast local inference (CPU/GPU), great for quantized models
  • Open WebUI - Optional local web UI (pairs well with local runtimes)

If you want β€œserver-style” hosting (advanced):

  • vLLM - High-throughput serving for NVIDIA GPUs
  • SGLang - Structured generation and serving workflows

Practical setup tips:

  1. Install the latest NVIDIA drivers (enable GPU acceleration in your chosen app)
  2. Start with smaller quantized models (Q4 is a common β€œbest default”)
  3. Keep context windows realistic for local hardware (lower context = faster, less memory)
  4. Watch VRAM first, then system RAM; reduce model size or quantization if either saturates
  5. Prefer running locally on localhost and only expose to LAN if you understand firewall rules

Example hardware configurations:

Hardware Good starting point Notes
Consumer GPU (24 GB VRAM) 7B–14B quantized e.g., RTX 4090, RTX 3090 β€” great for chat/coding
Pro GPU (48–80 GB VRAM) 14B–70B quantized e.g., A6000, A100 β€” coding agents, longer contexts
Multi-GPU (160+ GB VRAM) 70B+ quantized e.g., 2Γ—A100 β€” larger open-source models
CPU-only (32–64 GB RAM) 7B–14B quantized Slower but viable for offline chat; keep context moderate

Deployment Options

Option Best For Pros Cons
Local Machine Personal use Simple, no latency Limited hardware
Dedicated Server Team use Full control Maintenance
Cloud GPU Rental Experimentation On-demand Hourly costs
Kubernetes Enterprise Scalable Complex

Cost Analysis πŸ’°

Comprehensive pricing comparisons and cost calculations.

Pricing Tiers

Tier Price Range Models
πŸ†“ Free $0 Self-hosted, free tiers
πŸ’΅ Budget $0.07 - $0.50/1M GLM-4.7-FlashX, GLM-4-32B-0414-128K, Yi-Lightning, GPT-5.4 nano, Gemini 3.1 Flash-Lite, DeepSeek-V3.1, MiniMax-M2.5
πŸ’° Mid-range $0.60 - $15.00/1M GPT-5.4 mini, Claude Haiku 4.5, Kimi K2.5, Sonar, GLM-5, GPT-5.4, Claude Sonnet
πŸ’Ž Premium $15.00 - $600.00/1M GPT-5.4 Pro, Claude Opus, o1-Pro

Subscription Pricing (Monthly, USD)

AI chat apps

Product Plans (USD) Notes Official Source
ChatGPT Go $8, Plus $20, Pro $200, Business $25/seat (annual) or $30/seat (monthly), Enterprise (contact sales) Consumer prices are US-listed; Go is localized in some markets πŸ”—
Claude Pro $20, Max $100 (5Γ—) or $200 (20Γ—), Team/Enterprise (see pricing) Prices shown exclude applicable taxes; availability varies by region πŸ”—
Google AI (Gemini) Plus $7.99, Pro $19.99, Ultra $249.99 US pricing; some regions/local pricing differ πŸ”—

Coding assistants

Tool Plans (USD) Notes Official Source
GitHub Copilot Free $0, Pro $10, Pro+ $39, Business $19/user, Enterprise $39/user Annual options available for Pro/Pro+ πŸ”—

Model Pricing Comparison

Model Input Output Cached Input Best For
GLM-4.7-FlashX $0.07 $0.40 β€” Fast budget tasks
Step-3.5-Flash $0.10 $0.30 β€” Ultra-fast reasoning (85–350 tok/s)
GLM-4-32B-0414-128K $0.10 $0.10 β€” Budget chat/coding
Llama 4 Maverick $0.15 $0.60 β€” Open multimodal (self-host: $0)
GPT-5.4 nano $0.20 $1.25 $0.02 Classification and lightweight subagents
Grok 4 Fast $0.20 $0.50 $0.05 Fast Grok reasoning
Gemini 3.1 Flash-Lite $0.25 $1.50 Supported High-volume multimodal tasks
DeepSeek-V3.1 $0.27 $0.41 β€” Everything
DeepSeek-V3.2 $0.28 $0.42 $0.028 Budget workhorse, reasoning
DeepSeek-V4 $0.30 $0.50 $0.03 Engram memory, coding (off-peak 50% off)
Gemini 3 Flash $0.30 $2.50 $0.05 + $1/hr Long context
MiniMax-M2.5 $0.30 $1.20 Auto (included) Coding, long context
Mistral Large 3 $0.50 $1.50 $0.05 Open-weight 675B MoE
Kimi K2.5 $0.60 $3.00 Auto (included) Multimodal + agent tasks
GPT-5.4 mini $0.75 $4.50 $0.075 Fast coding and multimodal tasks
Claude Haiku 4.5 $1.00 $5.00 β€” Low-latency coding and sub-agents
GLM-5 $1.00 $3.20 $0.20 Agentic engineering
Perplexity Sonar $1.00 $1.00 β€” Web-grounded chat (request fees apply)
GPT-5.3-Codex $1.75 $14.00 $0.175 Agentic coding, 7+ hour autonomy
Gemini 3.1 Pro $2.00 $12.00 $0.20–$0.40 + $4.50/hr Frontier reasoning
Perplexity Sonar Reasoning Pro $2.00 $8.00 β€” Reasoning + search (request fees apply)
GPT-5.4 $2.50 $15.00 $0.25 Frontier coding and professional work
Grok 4 $3.00 $15.00 $0.75 First-principles reasoning
Perplexity Sonar Pro $3.00 $15.00 β€” Higher quality + search (request fees apply)
Claude Sonnet 4.5 $3.00 $15.00 $0.30 (hit) Best coding
Claude Sonnet 4.6 $3.00 $15.00 $0.30 (hit) Near-Opus performance
Claude Opus 4.6 $5.00 $25.00 $0.50 (hit) Agentic coding

Self-Hosting vs API (Monthly)

Usage Level Self-Host (A100) API (GPT-5) Winner
Light (1M tokens) $300 (rental) $10 API
Medium (100M tokens) $300 $1,000 Self-host
Heavy (1B tokens) $300 $10,000 Self-host
Enterprise (10B+ tokens) $2,000 (owned) $100,000+ Self-host

Reference πŸ“–

Reference materials including glossary, comparison tables, and data sources.

Glossary πŸ“–

Definitions of common terms used throughout the documentation.

A-E

Term Definition
Agent AI system that autonomously performs tasks and interacts with environments
API Interface for programmatically accessing AI models
Attention Mechanism Neural network component focusing on relevant input parts
Benchmark Standardized test measuring model performance
Chain-of-Thought (CoT) Prompting technique showing step-by-step reasoning

F-L

Term Definition
Fine-Tuning Adapting pre-trained model to specific tasks
Frontier Model State-of-the-art proprietary model
GPU Hardware accelerator essential for ML
LLM Large Language Model
LoRA Efficient fine-tuning method

M-R

Term Definition
MCP Model Context Protocol for tool interaction
MMLU Massive Multitask Language Understanding benchmark
MoE Mixture of Experts architecture
Multimodal Processing multiple input types
RAG Retrieval-Augmented Generation

S-Z

Term Definition
Self-Hosting Running models on own infrastructure
SLM Small Language Model
SWE-bench Benchmark for real GitHub issue resolution
Token Basic unit of text processing
VRAM GPU memory for model storage

Comparison Tables πŸ“Š

Side-by-side comparisons of AI models sorted by various criteria.

Sort by Latest Update (Default)

🏒 Company πŸ€– Model πŸ“¦ Version πŸ“… Release Date πŸ”„ Latest Updated πŸ’» Coding πŸ“Š Benchmarks πŸ’° Price πŸ–₯️ Self-Host πŸ”— Official Site
πŸ€– OpenAI GPT-5 5.4 mini 2026-03-17 00:00 UTC 2026-03-17 00:00 UTC ⭐ βœ… GPQA 87.5% $0.75 / $4.50 ❌ πŸ”—
πŸ€– OpenAI GPT-5 5.4 2026-03-05 00:00 UTC 2026-03-05 00:00 UTC ⭐ βœ… GPQA 92.0%, SWE-bench ~80% $2.50 / $15.00 ❌ πŸ”—
🌐 Google DeepMind Gemini 3.1 Flash-Lite 2026-03-03 00:00 UTC 2026-03-03 00:00 UTC ⭐ βœ… β€” $0.25 / $1.50 ❌ πŸ”—
πŸ”¬ DeepSeek DeepSeek V4 2026-02-17 00:00 UTC 2026-02-17 00:00 UTC βœ… No public benchmarks Pay-per-token βœ… πŸ”—
🌐 Google DeepMind Gemini 3 Deep Think 2026-02-12 00:00 UTC 2026-02-12 00:00 UTC ⭐ βœ… GPQA ~97%, ARC-AGI-2 84.6%, HLE 48.4% Ultra subscription ❌ πŸ”—
πŸ‡¨πŸ‡³ Zhipu AI GLM 5 2026-02-12 00:00 UTC 2026-02-12 00:00 UTC ⭐ βœ… GPQA 82.0%, SWE-bench 77.8% $1.00 / $3.20 βœ… πŸ”—
πŸ€– Anthropic Claude Opus 4.6 2026-02-05 00:00 UTC 2026-02-05 00:00 UTC ⭐ βœ… GPQA 91.3%, SWE-bench 80.8% $5 / $25 ❌ πŸ”—
πŸ€– OpenAI GPT-5 5.3-Codex 2026-02-05 00:00 UTC 2026-02-05 00:00 UTC ⭐ βœ… GPQA 91.5%, SWE-bench Pro 56.8% TBD ❌ πŸ”—
πŸŒ™ Moonshot AI Kimi K2.5 2026-01-29 00:00 UTC 2026-02-02 00:00 UTC ⭐ βœ… GPQA 87.6%, SWE-bench 76.8% $0.60 / $3.00 ❌ πŸ”—

Release Windows (Month-level)

🏒 Company πŸ€– Model πŸ“… Release Window Notes πŸ”— Official Site
🧠 MiniMax MiniMax M2.5 2026-02 $0.30 / $1.20 πŸ”—
πŸ‡¨πŸ‡³ Alibaba/Qwen Qwen 3.5-Max 2026-02 Open-source release window πŸ”—
🌐 Google DeepMind Gemini 3.1 Flash-Lite 2026-03 $0.25 / $1.50 πŸ”—
🌐 Google DeepMind Gemini 3 Pro 2026-01 Tiered pricing πŸ”—
πŸ€– OpenAI GPT-5.4 family 2026-03 GPT-5.4, GPT-5.4 mini, GPT-5.4 nano πŸ”—
πŸ’» Mistral AI Mistral Large 3 2026-01 Open-weight πŸ”—

Sort by Price (Cheapest)

Rank Model Input Output License
1 Self-hosted $0 $0 Various
2 GLM-4.7-Flash $0 $0 Free
3 GLM-4.7-FlashX $0.07 $0.40 API
4 GLM-4-32B-0414-128K $0.10 $0.10 API
5 Yi-Lightning $0.14 $0.42 Apache 2.0
6 GPT-5.4 nano $0.20 $1.25 Proprietary
7 Gemini 3.1 Flash-Lite $0.25 $1.50 Proprietary
8 DeepSeek-V3.1 $0.27 $0.41 MIT
9 Gemini 3 Flash $0.30 $2.50 Proprietary
10 MiniMax-M2.5 $0.30 $1.20 Proprietary

Sort by Performance (Coding)

Rank Model HumanEval Self-Host
1 Claude Sonnet 4.5 ~92% ❌
2 GPT-OSS-120B ~89% βœ…
3 DeepSeek-Coder-V2 ~92% βœ…
4 Qwen3-Coder ~92% βœ…
5 DeepSeek-V3.1 82%+ βœ…

Sort by Context Window

Rank Model Context Best For
1 Gemini 3 Flash 10M Entire libraries
2 Llama 4 Scout 10M Long-document RAG
3 Gemini 3 Pro 1M+ Research papers
4 Kimi K2.5 256K Large codebases

Data Sources πŸ“š

Attribution, verification sources, and methodology.

Primary Sources

Company Source URL
OpenAI Official Documentation openai.com
OpenAI ChatGPT agent release notes help.openai.com
OpenAI Model release notes help.openai.com
OpenAI API pricing platform.openai.com
OpenAI March 2026 model news openai.com
OpenAI ChatGPT subscriptions (Go/Plus/Pro) openai.com
OpenAI ChatGPT Business pricing help.openai.com
Anthropic Claude Documentation anthropic.com
Anthropic Claude Haiku 4.5 announcement anthropic.com
Anthropic Claude Pro pricing anthropic.com
Anthropic Max plan pricing anthropic.com
Google Gemini Documentation deepmind.google
Google Gemini 3.1 Flash-Lite model card deepmind.google
Google Project Mariner deepmind.google
Google Google AI plans one.google.com
Google Google AI Plus pricing blog.google
Google Google AI Pro pricing one.google.com
Google Google AI Ultra pricing blog.google
GitHub Copilot plans & pricing github.com
Zhipu AI (Z.ai) Developer Documentation docs.z.ai
MiniMax Developer Documentation platform.minimax.io
MiniMax Pricing (Pay‑as‑you‑go) platform.minimax.io
Moonshot AI Developer Documentation platform.moonshot.ai
Moonshot AI Models & Pricing platform.moonshot.ai
Cohere Developer Documentation docs.cohere.com
AI21 Labs Developer Documentation docs.ai21.com
Perplexity Developer Documentation docs.perplexity.ai
ByteDance (Volcengine) Developer Documentation volcengine.com
Tencent (Hunyuan) Cloud Documentation cloud.tencent.com
Baidu (ERNIE) AI Studio Documentation ai.baidu.com
DeepSeek Official Website deepseek.com
Meta Llama Documentation llama.meta.com

Benchmark Sources

Benchmark Source Description
GPQA Diamond Google Research Graduate-level science questions (PhD difficulty)
MMLU-Pro TIGER-Lab Extended multi-task language understanding
Arena Elo lmarena.ai Crowdsourced human preference ranking
HLE Scale AI Humanity's Last Exam β€” expert-level questions
SWE-bench Verified Princeton Real GitHub issue resolution (human-verified)
SWE-bench Pro Princeton More challenging subset of SWE-bench
LiveCodeBench LiveCodeBench Live competitive programming problems
AIME 2025 MAA American Invitational Mathematics Examination
ARC-AGI-2 ARC Prize Abstract reasoning challenge (fluid intelligence)
MMMU / MMMU-Pro MMMU Multi-discipline multimodal understanding
IFEval Google Research Instruction-following evaluation
FrontierMath Epoch AI Expert-level research mathematics
HumanEval OpenAI 164 Python programming problems

Verification Methodology

  1. Primary Source Review - Check official documentation
  2. Cross-Validation - Compare multiple sources
  3. Timestamp Verification - All data includes verification date
  4. Update Tracking - Monitor official channels

Last Updated: 2026-04-02 04:58 UTC Maintained by: ReadyPixels LLC


Made with ❀️ by ReadyPixels LLC

Star on GitHub

About

Research-based comparison of AI models, development tools, and automation resources. Compare releases, pricing, benchmarks, and deployment options from official sources.

Topics

Resources

License

Code of conduct

Contributing

Security policy

Stars

Watchers

Forks

Packages

 
 
 

Contributors