Awesome AI Models Matrix π§
Research-based list of AI models, development tools, and automation resources. Use it to compare releases, pricing, benchmarks, and deployment options from official sources.
Document Version: 2.1
Last Updated: 2026-04-02 04:58 UTC
Repository: https://github.com/ReadyPixels/AI_Models_Matrix
Comprehensive documentation of Large Language Models (LLMs), Small Language Models (SLMs), and specialized AI models available today.
State-of-the-art proprietary AI models with cutting-edge capabilities from leading AI labs.
Model
Company
Context
GPQA Diamond
Arena Elo
SWE-bench Verified
AIME 2025
Pricing
Verified
Claude Opus 4.6
Anthropic
1M
91.3%
1500 (Text) / 1549 (Code)
80.8%
99.8%
$5 / $25
2026-04-02
Claude Sonnet 4.6
Anthropic
1M
89.9%
~1438 (Text) / 1523 (Code)
79.6%
~95%
$3 / $15
2026-04-02
GPT-5.3-Codex
OpenAI
400K
91.5%
β
β
β
$1.75 / $14.00
2026-04-02
Gemini 3.1 Pro
Google
1M
94.3%
1494 (Text) / 1455 (Code)
80.6%
100%
$2 / $12
2026-04-02
Gemini 3 Deep Think
Google
1M+
~97%
β
~58%
β
Ultra subscription
2026-04-02
GLM-5
Zhipu AI
200K
82.0%
~1451 (Text) / 1445 (Code)
77.8%
92.7%
$1.00 / $3.20
2026-04-02
GLM-5.1
Zhipu AI
200K
β
β
~80.4% (est.)
β
$1.00 / $3.20
2026-04-02
MiniMax-M2.5
MiniMax
200K
85.2%
β
80.2%
86.3%
$0.30 / $1.20
2026-04-02
Kimi K2.5
Moonshot AI
256K
87.6%
β
76.8%
96.1%
$0.60 / $3.00
2026-04-02
DeepSeek-V4 [β οΈ Unverified]
DeepSeek
1M+
β
β
β
β
$0.30 / $0.50
2026-04-02
DeepSeek-V3.2
DeepSeek
164K
87.1%
β
67.8%
89.3%
$0.28 / $0.42
2026-04-02
Qwen3.5-Max
Alibaba
128K
89.3%
β
76.4%
91.3%
Pay-per-token
2026-04-02
Gemini 3 Pro
Google
1M+
91.9%
1486 (Text) / 1438 (Code)
76.2%
98β100%
Tiered pricing
2026-04-02
Gemini 3 Flash
Google
10M
90.4%
1474 (Text)
78.0%
β
$0.30 / $2.50
2026-04-02
Gemini 3.1 Flash-Lite [β οΈ Unverified]
Google
1M
β
β
β
β
$0.25 / $1.50
2026-04-02
GPT-5.4
OpenAI
1M
92.0%
1484 (Text) / 1457 (Code)
~80%
88%
$2.50 / $15.00
2026-04-02
GPT-5.4 mini
OpenAI
400K
87.5%
β
β
β
$0.75 / $4.50
2026-04-02
GPT-5.4 nano
OpenAI
400K
β
β
β
β
$0.20 / $1.25
2026-04-02
Step-3.5-Flash
StepFun
256K
83.1%
β
74.4%
97.3%
Pay-per-token
2026-04-02
Mistral Large 3 [β οΈ Unverified]
Mistral AI
128K
β
β
β
β
Varies
2026-04-02
Claude Sonnet 4.5
Anthropic
200K
83.4%
β
77.2%
87%
$3 / $15
2026-04-02
Llama 4 Scout [β οΈ Unverified]
Meta
10M
57.2%
β
β
β
Free (self-host)
2026-04-02
Llama 4 Maverick [β οΈ Unverified]
Meta
128K
69.8%
β
β
β
Free (self-host)
2026-04-02
Grok 4
xAI
128K
~91.5%
~1493 (Text)
β
100%
$3 / $15
2026-04-02
Grok 4 Fast [β οΈ Unverified]
xAI
128K
β
β
β
β
$0.20 / $1.50
2026-04-02
Category
#1
#2
#3
Coding
Claude Opus 4.6
GPT-5.3-Codex
Claude Sonnet 4.5
Reasoning
Gemini 3 Deep Think
Qwen3-Max-Thinking
o3
Open Source
DeepSeek-V4
Qwen3.5-Max
Llama 4
Cost Efficiency
DeepSeek-V3.1
Grok 4 Fast
GLM-4.7-FlashX
Context Window
Gemini 3 Flash (10M)
Llama 4 Scout (10M)
Claude Opus 4.6 (1M)
Model Specifications π
Detailed technical specifications, pricing, and capabilities for all frontier models. Data as of April 2026.
Maximum output tokens per single API request.
Model
Max Output
Context Window
Notes
Claude Opus 4.6
128K (300K via beta)
1M
Extended output via output-128k-2025-02-19 beta header
Claude Sonnet 4.6
64K
1M
β
Claude Sonnet 4.5
64K
200K
β
GPT-5.4
128K
1.05M
β
GPT-5.4 mini
128K
400K
β
GPT-5.4 nano
128K
400K
β
GPT-5.3-Codex
128K
400K
β
Gemini 3.1 Pro
64K
1M
β
Gemini 3 Pro
64K
2M
β
Gemini 3 Flash
64K
1M
β
Gemini 3.1 Flash-Lite
64K
1M
β
DeepSeek-V4
16K
1M
β
DeepSeek-V3.2
8K / 64K (reasoner)
128K
Reasoner mode unlocks 64K output
Qwen3.5-Max
65K
1M
β
GLM-5
128K
200K
β
GLM-5.1
131K
200K
β
MiniMax-M2.5
131K
1M
β
Kimi K2.5
β
256K
Not publicly specified
Step-3.5-Flash
66K
256K
β
Grok 4
β
256K
Not publicly specified
Grok 4 Fast
30K
2M
β
Mistral Large 3
β
262K
Not publicly specified
Llama 4 Scout
16K
10M
β
Llama 4 Maverick
16K
1M
β
Discounted pricing tiers for high-volume usage. All prices in USD per million tokens.
Model
Standard Input
Cached Input
Batch Discount
Notes
Claude Opus 4.6
$5.00
$0.50 (hit) / $6.25 (5m write)
50% off
Batch: $2.50 in / $12.50 out
Claude Sonnet 4.6
$3.00
$0.30 (hit) / $3.75 (5m write)
50% off
Batch: $1.50 in / $7.50 out
Claude Sonnet 4.5
$3.00
$0.30 (hit) / $3.75 (5m write)
50% off
Batch: $1.50 in / $7.50 out
GPT-5.4
$2.50
$0.25
50% off
Data residency +10%
GPT-5.4 mini
$0.75
$0.075
50% off
β
GPT-5.4 nano
$0.20
$0.02
50% off
β
GPT-5.3-Codex
$1.75
$0.175
50% off
β
Gemini 3.1 Pro
$2.00
$0.20β$0.40 + $4.50/hr storage
50% off
Tiered by input length
Gemini 3 Flash
$0.50
$0.05 + $1.00/hr storage
50% off
β
Gemini 3.1 Flash-Lite
$0.25
Supported
Supported
Exact rate not published
DeepSeek-V4
$0.30
$0.03 (90% off)
Off-peak 50% off
11PMβ7AM Beijing time
DeepSeek-V3.2
$0.28
$0.028
β
No formal batch API
Qwen3.5-Max
$0.40
Available
50% off
β
GLM-5 / GLM-5.1
$1.00
$0.20
β
β
Grok 4
$3.00
$0.75
β
β
Grok 4 Fast
$0.20
$0.05
β
β
Mistral Large 3
$0.50
$0.05
Available
β
Step-3.5-Flash
$0.10
β
β
β
Output throughput and time-to-first-token from Artificial Analysis and provider benchmarks.
Model
Output Speed (tok/s)
TTFT
Notes
Gemini 3.1 Flash-Lite
~363
β
Fastest frontier model
Step-3.5-Flash
85β350
β
Variable by provider; peak ~350 tok/s
Gemini 3 Flash
~193
~4.16s
β
MiniMax-M2.5 Lightning
~100
β
Faster tier
GPT-5.3-Codex
~86
~77.86s
High TTFT due to extended reasoning
Grok 4
~56
~8.96s
β
MiniMax-M2.5 Standard
~50
β
β
Most frontier models (Claude Opus/Sonnet 4.6, GPT-5.4, Gemini 3.1 Pro, etc.) have not yet been benchmarked on Artificial Analysis as of April 2026.
Knowledge cutoff dates β the point after which a model has no training data.
Model
Training Cutoff
Notes
Claude Sonnet 4.6
Jan 2026
Most recent cutoff among frontier models
Claude Opus 4.6
Aug 2025
Reliable knowledge: May 2025
GPT-5.4 / mini / nano
Aug 31, 2025
β
GPT-5.3-Codex
Aug 31, 2025
β
Grok 4 Fast
Jul 2025
β
DeepSeek-V4
May 2025
β
Gemini 3.1 Flash-Lite
Jan 2026
β
Gemini 3.1 Pro / 3 Pro / 3 Flash
Jan 2025
β
Grok 4
~NovβDec 2024
Approximate
DeepSeek-V3.2
Jul 2024
β
Llama 4 Scout / Maverick
Aug 2024
β
DeepSeek-R1
~Oct 2023
Based on base model
Models not listed (Qwen, GLM, MiniMax, Kimi, Step, Mistral): training cutoff not publicly disclosed.
Model
Languages
Details
Qwen3.5-Max
201
Largest language coverage
Llama 4 Scout
200
Pre-training languages
Qwen3-Max-Thinking
119
Qwen3 series
Gemini 3 Flash
100
91.8% MMMLU score across 100 languages
Gemini 3.1 Pro / 3 Pro
100+
β
Gemini 3.1 Flash-Lite
100+
88.9% MMMLU
Llama 4 Maverick
12
Output languages
Claude (all)
Many
English-optimized; broad multilingual
GPT-5.4 (all)
Many
Broad multilingual coverage
DeepSeek (all)
Many
Chinese + English focused
Grok (all)
Many
β
GLM-5 / GLM-5.1
Many
28.5T token training data
Structured Output & Function Calling
All frontier models support structured JSON output and function/tool calling except where noted.
Capability
Supported Models
Not Supported
Structured Output (JSON mode)
All models listed in Frontier table
Gemini 3 Deep Think (no API)
Function Calling / Tool Use
All models listed in Frontier table
Gemini 3 Deep Think (no API)
Gemini 3 Deep Think is available only via Gemini's in-app Think mode β no API access for structured output or function calling.
Provider
API Availability
Cloud Partners
Notes
Anthropic
Global
AWS Bedrock, GCP Vertex AI
US-only inference at 1.1x via inference_geo
OpenAI
Global
Azure OpenAI
Data residency endpoints +10% (post-3/5/26)
Google
Global
Google AI Studio, Vertex AI
Some regional restrictions per Google terms
DeepSeek
Global
Azure (R1 only, select regions)
China-based servers
Alibaba (Qwen)
Global
Alibaba Cloud Model Studio
China-based; globally accessible
Zhipu AI (GLM)
Global
Z.AI API
MIT license enables self-hosting anywhere
MiniMax
Global
MiniMax API
β
Moonshot AI (Kimi)
Global
platform.kimi.ai
MIT open-weight
xAI (Grok)
US-focused
Oracle OCI (East/Midwest/West)
Limited non-US availability
Mistral
Global
Azure AI Foundry, AWS, GCP
β
Meta (Llama)
Global (self-host)
All major cloud providers
Llama 4 Community License
StepFun
Global
HuggingFace
Apache 2.0 open-source
Self-hostable models with permissive licenses or open weights for privacy, cost control, and customization.
Model
Company
Params
Context
License
DeepSeek-V4
DeepSeek
671B
1M+
MIT
Qwen3.5-Max
Alibaba
1T+
128K
Apache 2.0
Qwen3-Max-Thinking
Alibaba
1T+
128K
Apache 2.0
Mistral Large 3
Mistral AI
675B (MoE)
128K
Apache 2.0
Llama 4 Scout
Meta
109B
10M
Community
Llama 4 Maverick
Meta
400B
128K
Community
GPT-OSS-120B
OpenAI
117B
128K
Apache 2.0
GPT-OSS-20B
OpenAI
21B
128K
Apache 2.0
Qwen3-Coder
Alibaba
480B
128K
Apache 2.0
GLM-4.7
Zhipu AI
400B+ MoE
128K
Open Weight
Phi-4
Microsoft
14B
128K
MIT
Granite 4.0
IBM
8B-3B
128K
Apache 2.0
DeepSeek-Coder-V2
DeepSeek
236B
128K
MIT
GLM-5.1
Zhipu AI
744B (40B active MoE)
200K
MIT
Step-3.5-Flash
StepFun
196B (11B active MoE)
256K
Open Weight
Yi-Coder
01.AI
9B/1.5B
128K
Apache 2.0
Local Inference Tools:
Ollama - Easy local deployment
LM Studio - User-friendly GUI
llama.cpp - Efficient CPU inference
vLLM - High-throughput serving
SGLang - Structured generation
Cloud Deployment:
Hugging Face Inference - Managed deployment
AWS SageMaker - Full control
Google Cloud Vertex - Integrated
RunPod - GPU rental
Specialized AI models optimized for software development tasks.
SWE-bench Verified Leaderboard
Rank
Model
Company
SWE-bench Verified
π₯ #1
Claude Opus 4.6
Anthropic
80.8%
π₯ #2
Gemini 3.1 Pro
Google
80.6%
π₯ #3
MiniMax-M2.5
MiniMax
80.2%
#4
GPT-5.4
OpenAI
~80%
#5
GPT-5.2
OpenAI
80.0%
#6
Claude Sonnet 4.6
Anthropic
79.6%
#7
Gemini 3 Flash
Google
78.0%
#8
GLM-5
Zhipu AI
77.8%
#9
Claude Sonnet 4.5
Anthropic
77.2%
#10
Kimi K2.5
Moonshot AI
76.8%
Model
Developer
Pricing
Best For
Claude Opus 4.6
Anthropic
$5 / $25 per 1M
Agentic coding, complex tasks
GPT-5.3-Codex
OpenAI
TBD
Agentic coding, 7+ hour autonomy
Claude Haiku 4.5
Anthropic
$1 / $5 per 1M
Low-latency coding, sub-agents, computer use
GLM-5-Code
Zhipu AI
$1.20 / $5.00 per 1M
Code generation, refactoring
MiniMax-M2.5
MiniMax
$0.30 / $1.20 per 1M
Code generation, refactoring
Claude Sonnet 4.5
Anthropic
$3 / $15 per 1M
Code review, refactoring
Codestral
Mistral AI
$0.30 / $0.90
Real-time completion
Grok 4 Fast
xAI
$0.20 / $1.50
Most used (50% share)
Open-Source Coding Models
Model
Developer
License
Hardware
GPT-OSS-120B
OpenAI
Apache 2.0
80-160 GB VRAM
Qwen3-Coder
Alibaba
Apache 2.0
160-320 GB VRAM
DeepSeek-Coder-V2
DeepSeek
MIT
48-80 GB VRAM
GLM-4.6
Zhipu AI
Open Weight
80-160 GB VRAM
Phi-4
Microsoft
MIT
24-48 GB VRAM
Models optimized for step-by-step reasoning, mathematical problem-solving, and complex logical inference.
Rank
Model
AIME 2025
ARC-AGI-2
Notes
π₯ #1
Gemini 3.1 Pro
100%
77.1%
Highest combined reasoning
π₯ #2
GPT-5.2
100%
52.9%
No tools needed
π₯ #3
Grok 4
100%
β
First-principles reasoning
#4
Claude Opus 4.6
99.8%
68.8%
Near-perfect AIME
#5
Gemini 3 Pro
98β100%
31.1β45.1%
With code execution
#6
Step-3.5-Flash
97.3%
β
Best efficiency ratio
#7
Kimi K2.5
96.1%
β
Strong multimodal reasoning
#8
Claude Sonnet 4.6
~95%
58.3%
Near-Opus performance
#9
GLM-5
92.7%
β
Thinking mode
#10
DeepSeek-V3.2
89.3%
β
Budget reasoning
Model
Type
Context
Pricing
Gemini 3 Deep Think
Reasoning
1M+
Ultra subscription
Qwen3-Max-Thinking
Reasoning/Coding
128K
$1.20 / $6.00
o3 / o1-Pro
Reasoning
128K
$2-150 / $8-600
Gemini 3 Pro
General/Multimodal
1M+
$2 / $12
DeepSeek-R1
Reasoning
128K
$0.50 / $2.15
Claude Sonnet 4.5
Hybrid
200K
$3 / $15
Mathematical Problem Solving : Qwen3-Max-Thinking, GPT-5 Pro, Gemini 3 Pro
Scientific Analysis : Claude Opus 4.6, GPT-5, Gemini 3 Pro
Strategic Planning : o3/o1-Pro, Claude Sonnet 4.5, DeepSeek-R1
Code Debugging : Claude Sonnet 4.5, GPT-5.3-Codex, DeepSeek-V3.1
Models capable of processing and generating multiple types of content: text, images, audio, and video.
Leading Multimodal Models
Model
Developer
Context
Key Features
GPT-5.4
OpenAI
1M
Unified multimodal, audio
Gemini 3 Pro
Google
1M+
Native multimodal, video
Claude Sonnet 4.5
Anthropic
200K
Document understanding
Llama 4 Maverick
Meta
128K
Open multimodal
Model
MMMU / MMMU-Pro
MathVista
DocVQA
Gemini 3.1 Pro
95% (MMMU-Pro)
β
β
GPT-5.4
94% (MMMU-Pro)
β
β
Gemini 3 Pro
81% (MMMU-Pro)
β
β
Gemini 3 Flash
80% (MMMU-Pro)
β
β
Claude Sonnet 4.5
77.8% (MMMU)
β
β
Llama 4 Maverick
73.4% (MMMU)
β
β
Model
Speech-to-Text
Text-to-Speech
Video Input
Gemini 3 Pro
β
β
β
GPT-5
β
β
β οΈ
Whisper v3
β
β
β
Model
Developer
License
Best For
Flux.1
Black Forest Labs
Apache 2.0
High-fidelity art
Stable Diffusion 3.5
Stability AI
Community License
Fine-tuning
GLM-Image
Zhipu AI (Z.ai)
API
Fast image generation
CogView-4
Zhipu AI (Z.ai)
API
Creative image generation
Hardware Requirements π₯οΈ
Comprehensive hardware specifications for self-hosting AI models.
Quick Reference by Model Size
Model
Params
Q4 Size
Min VRAM
Rec VRAM
Min RAM
Phi-4
14B
8 GB
24 GB
48 GB
32 GB
GPT-OSS-20B
21B
12 GB
24 GB
48 GB
32 GB
Llama 4 Scout
109B
66 GB
48 GB
80 GB
96 GB
GPT-OSS-120B
117B
70 GB
80 GB
160 GB
128 GB
DeepSeek-Coder-V2
236B
143 GB
48 GB
80 GB
192 GB
Llama 4 Maverick
400B
242 GB
160 GB
320 GB
320 GB
DeepSeek-V4
671B
404 GB
80 GB
320 GB
512 GB
Qwen3-Max-Thinking
1T+
600+ GB
160 GB
640 GB
768 GB
Consumer/Entry Level (24-48 GB VRAM):
Phi-4, GPT-OSS-20B, Yi-Coder, Qwen2.5-Coder
Recommended GPUs : RTX 3090 (24GB), RTX 4090 (24GB)
Professional (80-160 GB VRAM):
Llama 4 Scout, GPT-OSS-120B, DeepSeek-Coder-V2
Recommended GPUs : A100 80GB, 2x A100 40GB
Enterprise (320+ GB VRAM):
Llama 4 Maverick, GLM-4.7, DeepSeek-V4, Qwen3-Max-Thinking
Recommended GPUs : 4x A100 80GB, 8x A100 80GB
Level
Bits
Size vs FP16
Quality
Use Case
FP16/BF16
16
100%
Best
Training
Q8_0
8
~50%
Excellent
High-quality inference
Q4_K_M
4
~25%
Good
Recommended for deployment
Q3_K_M
3
~19%
Fair
Limited resources
Comprehensive Benchmark Reference π
Detailed benchmark scores across all major evaluations. Scores are percentages (%) unless noted. Arena Elo scores are integers. β = not publicly reported. Data as of April 2026.
Model
GPQA Diamond
MMLU-Pro
Arena Elo (Text)
HLE
SWE-bench Verified
SWE-bench Pro
LiveCodeBench
AIME 2025
ARC-AGI-2
MMMU-Pro
IFEval
FrontierMath
Claude Opus 4.6
91.3%
β
1500
40.0β53.0%
80.8%
β
β
99.8%
68.8%
β
β
β
Claude Sonnet 4.6
89.9%
β
~1438
33.2β49.0%
79.6%
β
β
~95%
58.3%
β
β
β
Claude Sonnet 4.5
83.4%
88.0%
β
β
77.2%
β
β
87β100%
β
β
β
β
GPT-5.4
92.0%
94%
1484
36.6β41.6%
~80%
57.7%
84β88%
88%
73.3%
94%
β
50% (Pro)
GPT-5.4 mini
87.5%
β
β
β
β
54.4%
β
β
β
β
β
β
GPT-5.3-Codex
91.5%
β
β
β
β
56.8%
85%
β
β
β
β
β
GPT-5.2
92.4%
β
1479
35.2%
80.0%
55.6%
β
100%
52.9%
β
95.6%
~40.3%
Gemini 3.1 Pro
94.3%
92%
1494
44.4β51.4%
80.6%
54.2β72%
71%
100%
77.1%
95%
95%
β
Gemini 3 Pro
91.9β93.8%
83%
1486
37.5%
76.2%
43.3%
49%
98β100%
31.1β45.1%
81%
88%
38%
Gemini 3 Flash
90.4%
72%
1474
33.7%
78.0%
44%
β
β
β
80%
85%
β
Gemini 3 Deep Think
~97%
81%
β
48.4%
~58%
63%
58%
β
84.6%
β
β
β
DeepSeek-V3.2
87.1%
85.0%
β
25.1%
67.8%
β
β
89.3%
β
β
β
β
DeepSeek-R1
71.5%
84.0%
β
8.5%
49.2%
β
63.5%
70.0%
β
β
β
β
Qwen3.5-Max
89.3%
β
β
β
76.4%
β
β
91.3%
β
79%
β
β
Qwen3-Max-Thinking
86.1%
β
β
26.2%
β
β
β
β
β
β
β
β
GLM-5
82.0%
β
~1451
10.4%
77.8%
β
β
92.7%
β
β
β
β
GLM-5.1
β
β
β
β
~80.4% (est.)
β
β
β
β
β
β
β
Kimi K2.5
87.6%
87.1%
β
31.5β50.2%
76.8%
β
85.0%
96.1%
β
78.5%
β
β
MiniMax-M2.5
85.2%
β
β
β
80.2%
55.4%
β
86.3%
β
β
β
β
Step-3.5-Flash
83.1%
β
β
β
74.4%
β
86.4%
97.3%
β
β
β
β
Grok 4
~91.5%
91.5%
~1493
50.7%
β
β
β
100%
β
β
β
β
Llama 4 Maverick
69.8%
80.5%
β
β
β
β
43.4%
β
β
β
β
β
Llama 4 Scout
57.2%
74.3%
β
β
β
β
32.8%
β
β
β
β
β
FrontierMath is a benchmark of 350 original, exceptionally challenging mathematics problems created by expert mathematicians (Epoch AI). Problems span number theory, analysis, algebraic geometry, and category theory. Tier 4 problems can take research mathematicians multiple days.
Benchmark
Description
Source
GPQA Diamond
Graduate-level science questions (PhD difficulty)
Google Research
MMLU-Pro
Extended multi-task language understanding (harder than MMLU)
TIGER-Lab
Arena Elo
Crowdsourced human preference ranking
lmarena.ai
HLE
Humanity's Last Exam β expert-level questions
Scale AI
SWE-bench Verified
Real GitHub issue resolution (human-verified subset)
SWE-bench
SWE-bench Pro
More challenging subset of SWE-bench
SWE-bench
LiveCodeBench
Live competitive programming problems (not in training data)
LiveCodeBench
AIME 2025
American Invitational Mathematics Examination
MAA
ARC-AGI-2
Abstract reasoning challenge (fluid intelligence)
ARC Prize
MMMU / MMMU-Pro
Multi-discipline multimodal understanding
MMMU
IFEval
Instruction-following evaluation
Google Research
FrontierMath
Expert-level research mathematics (Epoch AI)
Epoch AI
Development Tools π οΈ
AI-powered tools for software development, from IDEs and CLI tools to API providers and IDE extensions.
Integrated Development Environments with built-in AI capabilities.
IDE
Platform
Version
Release Date
Pricing
Key Features
GitHub
Firebase Studio
Web
-
-
Free (3 workspaces, up to 30 with Google Developer Program)
Cloud-based, Gemini, MCP
π
Lingma IDE (ιδΉη΅η )
Windows, macOS
-
-
Free (download)
Built-in agent, MCP tool use, terminal command execution
β
Tonkotsu
Windows, macOS
-
-
Free (during early access)
Team of agents, workflow
π
OpenCode
Windows, macOS, Linux
-
-
Free (OSS)
Terminal, desktop, IDE extension, multi-provider
π
Codex app
Windows
-
2026-03-04 00:00 UTC
Included with Codex plans
Multiple agents, isolated worktrees, reviewable diffs, CLI and IDE interop
π
Visual Studio
Windows, macOS
17.14.12+, 18.1.0+
2026-01-06 00:00 UTC
Free / $250/yr
Gemini 3 Flash integration, faster performance, zero-migration upgrades, real-time profiler agent
β
IntelliJ IDEA
Windows, macOS, Linux
2025.3.2
2026-01
Free / $149/yr
Java 24 support, Kotlin K2 mode, performance and memory improvements
β
Editor
Platform
Version
Release Date
Pricing
Key Features
GitHub
Zed
macOS, Windows, Linux
0.226.3
2026-03-03 00:00 UTC
Free (OSS) + Copilot $10/mo
Fast, collaboration, Gemini and Claude, Zeta AI, agent thread history, edit prediction providers, self-hosted OpenAI-compatible servers
π
Dyad
Windows, macOS, Linux
-
-
Free (OSS)
Local generation, BYO keys
π
Memex
macOS, Windows
-
-
Freemium (Free + $10/mo)
Agentic, browserβdesktop
π
IDE
Platform
Version
Release Date
Pricing
Autonomous
MCP
GitHub
Cursor
Windows, macOS, Linux
0.46+
2026-02-12 00:00 UTC
Freemium (Free + Pro $19/mo or $39/mo)
β
β
β
Windsurf
Windows, macOS, Linux
1.9552+
2026-02-12 00:00 UTC
Freemium (Free + Pro)
β
β
β
Trae
macOS, Windows
-
-
Free
β
β
π
PearAI
Windows, macOS, Linux
-
-
Free (OSS)
β
β
π
Void
Windows, macOS, Linux
-
-
Free (OSS)
β
β
π
Kiro
Windows, macOS, Linux
-
-
Free (Preview)
β
β
π
IDE
Platform
Version
Release Date
Pricing
Self-Hostable
Best For
GitHub
Replit 3
Web
-
-
Free Starter, Core $20/mo , Pro $100/mo
β
Learning/Prototyping
β
Bolt.new
Web
-
-
Free, Pro $20-25/mo, Teams $30/user/mo
β
Quick apps
β
Bolt.diy
Self-hosted
-
-
Free (MIT), bring your own API
β
Self-hosted
π
Lovable
Web
-
-
Free (5 credits/day), Pro $25/mo, Business $50/mo
β
UI/Full-stack
β
v0
Web
-
-
Free ($5 credits/mo), Premium $20/mo, Teams $30/user
β
React components
β
Gitpod
Web
-
-
Free + Paid
β
Cloud dev environments
β
Rork
Web
-
-
Free & Paid (credits)
β
Mobile apps (iOS/Android)
β
Google Antigravity
Web
-
-
Google AI Pro / Ultra
Agent-first development with Gemini-powered coding
β
Jules
Web
-
2025-05-20 00:00 UTC
Free beta, higher limits on Google AI Pro / Ultra
Async repo agent, reviewable diffs, GitHub integration
β
Command-line AI tools for autonomous coding and terminal enhancement.
Tool
Platform
Pricing
Key Features
GitHub
Aider
Windows, macOS, Linux
Free
Gold standard, Architect mode, thinking tokens
π
Claude Code 2.1+
macOS, Linux, Windows
Free + API
Fast mode for Opus 4.6, simple mode file editing, Unicode fix
π
Codex CLI
Windows, macOS, Linux
Included
Sandbox, approval modes
π
Goose
Windows, macOS, Linux
Free (Apache-2.0)
MCP, extensible, desktop app, 25+ providers
π
GPT-Pilot
Windows, macOS, Linux
Free
Full dev team simulation
π
OpenHands
Windows, macOS, Linux
Free
Cloud agents, MCP
π
Mentat
Windows, macOS, Linux
Free
Multi-file coordination
π
Tool
Developer
Pricing
Best For
Gemini CLI
Google
Free
Google ecosystem
Cursor CLI
Cursor
Free tier
Terminal + IDE bridge
Qwen Code
Alibaba
Free
Qwen optimization
Qodo CLI
Qodo
Free tier
Testing and review
Tool
Platform
Pricing
Key Features
Warp Terminal
macOS, Linux, Windows
Free
AI Agents, workflow sharing
Fig
macOS, Linux
Free
Autocomplete, AI suggestions
Extensions and plugins that add AI capabilities to existing IDEs.
Universal (Cross-Platform)
Add-on
Platform
Pricing
Context
Best For
GitHub
GitHub Copilot
VS Code, JetBrains, Vim
Free / $10/mo / $39/mo
Large
General coding
β
Supermaven
VS Code, JetBrains, Neovim
Free / $10/mo
1M
Large codebases
β
Codeium
VS Code, JetBrains, Vim
Free / $15/mo / $60/mo
Medium
Free alternative
β
Continue
VS Code, JetBrains
Free (OSS)
Custom
Self-hosted
π
Cody
VS Code, JetBrains, Web
Free (discontinued) / Enterprise Starter $19/mo / Enterprise $59/mo
Enterprise
Code search
π
Tabnine
VS Code, JetBrains, VS, Eclipse
Free / $39/mo
Local
Privacy
β
Add-on
Pricing
Autonomous
MCP
Best For
GitHub
Codex
Free (with ChatGPT Plus $20/mo or Pro $200/mo)
β
β
OpenAI's official coding agent
π
Cline
Free
β
β
Full agent
π
GitHub Copilot (Agent Mode)
$0 / $10 / $39/mo
β οΈ
β
Guided agent workflows
β
RooCode
Free/Pro
β οΈ
β
Complex tasks
π
Keploy
OSS/Enterprise
β
β
Testing
β
Add-on
Pricing
Claude Agent
Best For
JetBrains AI Assistant
$10/mo (Pro), $249/yr (Ultimate)
β
Deep IDE integration
JetBrains Claude Agent
Included in subscription
β
Native agent
Services for accessing AI models via API.
Provider
Models
Pricing
OpenAI
GPT-5.4, GPT-5.4 mini, GPT-5.4 nano, o3, Codex
Pay-per-token
Anthropic
Claude Opus 4.6, Sonnet 4.6, Haiku 4.5
Pay-per-token
Google AI Studio
Gemini 3.1 Pro, Gemini 3.1 Flash-Lite, Gemini 3 Flash
Free / Pay
Z.ai (Zhipu AI)
GLM-5, GLM-5-Code, GLM-4.7
Pay-per-token
MiniMax
MiniMax-M2.5/M2.1/M2
Pay-per-token
Cohere
Command, Embed, Rerank
Pay-per-token
AI21 Labs
Jamba
Pay-per-token
Perplexity
Sonar / Sonar Pro / Sonar Reasoning Pro
Pay-per-token + request fees
Moonshot AI
Kimi (kimi-k2.5, kimi-k2-thinking)
Pay-per-token
ByteDance (Volcengine)
Doubao
Pay-per-token
Tencent (Hunyuan)
Hunyuan
Pay-per-token
Baidu (ERNIE)
ERNIE
Pay-per-token
DeepSeek
DeepSeek-V4/R1
Pay-per-token
Mistral AI
Mistral Large 3
Pay-per-token
xAI
Grok-4
Pay-per-token
Unified APIs & Aggregators
Provider
Models
Key Features
OpenRouter
200+
Crypto/fiat, rankings
Hugging Face
Thousands
Serverless inference
Provider
Specialization
Speed
Together AI
Llama/Qwen/Mistral
Fast
Fireworks AI
FireAttention
Low-latency
Groq
LPU
>500 T/s
Cerebras
Wafer-Scale
>2000 T/s
Provider
Type
Best For
RunPod
GPU Rental
Flexibility
Replicate
Model-as-a-Service
Quick deployment
Vultr
Global Cloud
Hourly
Hyperbolic
Decentralized
Crypto/Fiat
AI-powered tools for automating browser and desktop tasks.
Tools and frameworks for AI-powered browser automation.
Browser
Pricing
Open Source
Local AI
Best For
GitHub
Sigma AI Browser
Freemium (Free + Pro $9.99/mo)
β
β
Offline AI, agentic browsing
β
ChatGPT Atlas
Free (with ChatGPT subscription)
β
β
OpenAI integration, macOS
π
Genspark
Freemium (Free + Plus $19.99/mo + Pro $249/mo)
β
β
AI Super Agent, research
β
BrowserOS
Free
β
β
Privacy-focused
π
Brave Leo
Freemium (Free + Premium)
β
β οΈ (Experimental)
Privacy-focused AI
β
Fellou
Freemium (Free for 4 tasks, $19/mo Pro)
β
β
True agentic browser
π
Perplexity Comet
Free (with Pro $20/mo) or $5/mo Standalone
β
β
Research
β
Dia
Freemium (Free limited, $20/mo Pro, Dia+ $50/mo)
β
β
Arc replacement
β
Opera Neon
$19.90/mo
β
β
Agentic browsing
β
Opera One (Aria)
Free
β
β
Built-in AI assistant
β
Edge Copilot
Free (Copilot Pro $20/mo)
β
β
Enterprise AI browser
β
BrowserGPT
Freemium (Free + Premium)
β
β
Mobile-first AI browser (iOS/Android)
β
Arc Max
Free
β
β
AI-enhanced browsing, macOS
β
AnythingLLM
Free (OSS)
β
β
All-in-one desktop AI, document chat, local models
π
Extension
Pricing
Free
Multi-Agent
Best For
GitHub
Harpa AI
Free
β
β
Automation recipes
π
MultiOn
Free/Paid
β οΈ
β
Complex tasks
π
NanoBrowser
Free
β
β
Local control
π
Neobrowser
Free (OSS)
β
β
Local LLMs via Ollama, privacy-first, Chrome/Edge
β
Library
Language
Best For
GitHub
Browser-use
Python
Agentic automation
π
Stagehand
TypeScript
Web apps
π
LaVague
Python
NL to code
π
Skyvern
Python
CV-based automation
π
Firecrawl
Python / CLI
LLM-powered crawling & scraping with prompt chaining
π
Service
Platform
Pricing
Best For
GitHub
ChatGPT agent
ChatGPT
Plus / Pro / Team
Guided browser tasks, research, forms, and spreadsheets
β
Project Mariner
Google AI Ultra
Included with Google AI Ultra
Multi-step browser tasks, shopping, and reservations
β
Skyvern Cloud
Cloud API
Paid
Resilient automation
π
Browserbase
Cloud API
Paid
Stealth mode, session recording
β
Platforms and runtimes for running or connecting AI agents.
Project
Type
Pricing
Self-Hostable
Best For
Official
OpenClaw
Personal AI assistant
Free (OSS)
β
Always-on assistant across chat channels
π
NanoClaw
Lightweight agent framework
Free (OSS)
β
Containerized agents for WhatsApp, Telegram, Slack, Discord
π
CrewAI
Multi-agent orchestration
Free (OSS) / Enterprise
β
Team-based AI agent workflows
π
AutoGen
Multi-agent framework
Free (OSS)
β
Conversational agent collaboration
π
LangGraph
Agent framework
Free (OSS) / LangSmith paid
β
Stateful, cyclic agent workflows
π
Dify
LLM app platform
Free (OSS) / Cloud plans
β
Visual workflow builder, RAG, agents
π
n8n
Workflow automation
Free (OSS) / Cloud from $20/mo
β
No-code automation with AI agent nodes
π
Flowise
LLM orchestration
Free (OSS)
β
Drag-and-drop LLM flow builder
π
Lindy
AI agent builder
Freemium (Free + Pro $49/mo)
β
No-code AI agents for business tasks
π
Relevance AI
Agent platform
Freemium (Free + Paid plans)
β
Build and deploy AI agents, no-code
π
Moltbook
Agent social network
Free
β
Discovering and pairing with AI agents
π
ZeroClaw
Privacy-first agent runtime (Rust)
Free (OSS)
β
Deploy anywhere, swap any LLM, zero external API calls
π
NullClaw
Sandboxed agent runtime (Zig)
Free (OSS)
β
Ultra-fast, minimal footprint, sandboxed agent tasks
π
Moltis
Rust-native single-binary agent
Free (OSS)
β
Sandboxed, auditable, voice + memory + MCP tools built-in
π
Hermes Agent
Adaptive agent framework (Nous Research)
Free (OSS)
β
Memory management, skills, UI dashboard, grows with you
π
PicoClaw
Ultra-lightweight agent (Go)
Free (OSS)
β
Tiny, fast, embedded/IoT deployments, single-binary
π
AutoGPT
Autonomous agent
Free (OSS)
β
Self-prompting GPT agent with memory, pioneer project
π
BabyAGI
Task-driven agent
Free (OSS)
β
Autonomous task creation and prioritization
π
Suna
Generalist agent
Free (OSS)
β
Versatile open-source agent for complex tasks (Kortix)
π
OWL
Multi-agent framework
Free (OSS)
β
Distributed task automation (Camel-AI)
π
CogAgent
Vision GUI model
Free (Research)
β
High-performance vision-based GUI understanding (Tsinghua/Zhipu)
π
HyperAgent
Code agent
Free (OSS)
β
GitHub issue resolution, repository-level code generation
π
Managed cloud services for building and deploying AI agents at scale.
Service
Provider
Pricing
Best For
Google Vertex AI Agent Builder
Google Cloud
Pay-per-use
Enterprise agents grounded in Google Search and data stores
Amazon Bedrock Agents
AWS
Pay-per-use
Serverless agents with knowledge bases and guardrails
Azure AI Agent Service
Microsoft Azure
Pay-per-use
Enterprise agents with Azure AI Search and OpenAI integration
Desktop Automation π₯οΈ
AI agents and tools for automating desktop tasks and OS-level interactions.
Local Computer Use Agents
Agents that run directly on your machine and interact with the OS, screen, keyboard, and mouse.
Agent
Windows
macOS
Linux
Vision
Best For
GitHub
Agent S
β
β
β
β
Research/SOTA, GUI grounding
π
Simular Agent S2
β
β
β
β
Latest SOTA, improved grounding
π
Open Interpreter
β
β
β
β οΈ
Natural language computer control, 63K+ stars
π
Open-Interface
β
β
β
β
General-purpose desktop automation
π
UFO
β
β
β
β
Windows-specific app automation
π
Bytebot
β
β
β
β
Self-hosted (Docker), headless
π
Microsoft Fara-7B
β
β
β
β
Open-weight vision grounding model
π
UI-TARS
β
β
β
β
Autonomous GUI execution, vision-language-action model (ByteDance)
π
c/ua
β
β
β
β
Isolated VM environments, open-source CU infrastructure
π
Windows-Use
β
β
β
β
Windows OS-specific agent automation
π
OpenCUA
β
β
β
β
Open foundations for computer-use agents
π
Devin
β
β
β
β
Full-stack software engineering agent (Cognition Labs)
β
Ace
β
β
β
β
20x human speed on UI tasks (General Agents)
β
Cloud / API Computer Use Agents
Agents accessed via API or cloud service β OS-independent, but require internet connectivity.
Agent
Interface
Vision
Best For
GitHub
Anthropic Computer Use
API
β
Beta capability, Claude-powered desktop control
β
OpenAI Operator
API
β
Guided browser and desktop computer use
β
Amazon Nova Act
API
β
AWS browser automation SDK
β
Manus AI
Cloud
β
General-purpose cloud agent
β
Adept AI (ACT-1)
API
β
Pioneer in digital actions, self-correcting behavior
β
AskUI Vision Agent
API
β
Cross-platform vision automation without VMs
β
Highlight AI
Desktop + Cloud
β
Privacy-first desktop Q&A and automation
β
AI-native operating systems and platforms that embed LLMs as core system components.
OS / Platform
Type
Hardware
Local/Cloud
Best For
GitHub
AIOS
Open Source (MIT)
Any
Both
Kernel-level LLM agent OS, agent scheduling & memory management
π
Ghost OS
Open Source
Any
Local
Autonomous agent workflows
π
computer_use_ootb
Open Source
Any
Local/API
Out-of-the-box GUI automation (Claude 3.5 CU + local models)
π
Rabbit OS (R1)
Commercial ($199 device)
R1 Device
Cloud
Consumer AI assistant, LAM-based app automation
β
Apple Intelligence
Commercial (OS-level)
Apple Silicon (M1+)
On-device / Private Cloud
Privacy-first, system-wide writing, Siri, image generation
β
Windows Copilot+
Commercial (OS-level)
NPU (40+ TOPS)
Hybrid
Recall, Cocreator, live captions, enterprise productivity
β
Tool
Platform
Best For
Ui.Vision RPA
Windows, macOS, Linux
Visual automation
OmniParser V2
Cross-platform
Screen parsing
Tool
Platform
Key Features
GitHub
PyAutoGUI
Cross-platform
Simple API, fail-safe
π
Nut.js
Cross-platform
Visual search, image matching
β
OpenAdapt
Windows, macOS
Learning from demonstration
π
Research Projects (Computer Use)
Notable academic and industry research advancing the field of computer-use agents.
Project
Developer
Focus
Year
Paper
Gato
Google DeepMind
Multi-modal, multi-task, multi-embodiment agent
2022
DeepMind
PaLM-E
Google DeepMind
Embodied multimodal language model
2023
arXiv
RT-2
Google DeepMind
Vision-language-action model for robotics
2023
arXiv
HuggingGPT (Jarvis)
Microsoft
Orchestrates specialists for multi-modal tasks
2023
arXiv
SIMA
Google DeepMind
Generalist AI agent for 3D virtual environments
2024
DeepMind
Magma
Microsoft Research
Vision-language-action foundation model
2025
arXiv
WebAgent
Google DeepMind
Autonomous web browsing and form-filling
2024
arXiv
WebVoyager
Hongliang He et al.
Autonomous web browsing (59.1% on 15-website benchmark)
2024
arXiv
AI Infrastructure ποΈ
Tools, frameworks, and specialized models for building production AI systems β from embeddings and video generation to safety, evaluation, and model routing.
Embedding & Reranking Models π§²
Specialized models for converting text (or images) into dense vector representations and for reranking retrieval results. Essential infrastructure for RAG pipelines and semantic search. Prices as of April 2026.
Model
Developer
Dimensions
Max Tokens
Pricing
Best For
GitHub
text-embedding-3-small
OpenAI
1,536
8,191
$0.02/1M tokens
Cost-effective English embeddings
β
text-embedding-3-large
OpenAI
3,072
8,191
$0.13/1M tokens
Highest-quality English retrieval
β
Embed v4
Cohere
1,536
128K
$0.12/1M (text), $0.47/1M (image)
Multimodal text + image RAG
β
voyage-3-large
Voyage AI
256β2,048 (flex)
32K
~$0.18/1M tokens
Highest-quality retrieval, long context
β
jina-embeddings-v3
Jina AI
32β1,024 (flex)
8,192
API pay-per-use
Multilingual, task-adaptive (LoRA heads)
π
BGE-M3
BAAI
1,024
8,192
Free (open-source)
Multi-functional: dense + sparse + ColBERT
π
Nomic Embed v2 (MoE)
Nomic AI
256β768 (flex)
512
Free (open-source)
Multilingual, MoE efficiency (305M active)
π
text-embedding-005
Google (Vertex AI)
768
2,048
$0.10/1M tokens
GCP-native semantic search
β
Model
Developer
Max Tokens
Pricing
Best For
GitHub
Rerank 4.0 Pro
Cohere
32K
$1.00/1K queries
High-accuracy domain-specific reranking
β
Rerank 4.0 Fast
Cohere
32K
$0.50/1K queries
Low-latency production reranking
β
rerank-2.5
Voyage AI
32K
API pay-per-use
Instruction-following, multilingual
β
BGE Reranker v2-m3
BAAI
8,192
Free (open-source)
Open-source cross-encoder reranking
π
Jina Reranker v2
Jina AI
8,192
API pay-per-use
Multilingual, long-context reranking
β
Video Generation Models π¬
Text-to-video and image-to-video generation models for creating short clips from prompts. The field is moving rapidly β resolutions, durations, and pricing change frequently. Specs as of April 2026.
Model
Developer
Resolution
Duration
Pricing
Open Source
Best For
GitHub
Sora 2
OpenAI
Up to 1080p
Up to 20s (Pro)
$20β$200/mo via ChatGPT
No
Cinematic quality, long clips
β
Veo 3
Google DeepMind
720pβ1080p
Up to 8s (extendable)
~$0.20β$0.40/s
No
Native audio + video, realistic physics
β
Runway Gen-4 / Gen-4.5
Runway
Up to 4K
Up to 16s
$12β$76/mo
No
Professional creative workflows
β
Kling 2.0
Kuaishou
1080p
Up to 10s
Free / $5.99β$66/mo
No
Budget production, fast turnaround
β
Pika 2.0
Pika Labs
1080p
Up to 5s
Free / $8β$58/mo
No
Social media, creative effects
β
MiniMax Video-01
MiniMax
720p
Up to 6s
~$0.40/video
No
Strong text-motion responsiveness
β
HunyuanVideo
Tencent
720pβ2K
Up to 16s
Free (self-host; ~60GB VRAM)
Yes (Apache 2.0)
High per-frame fidelity, long clips
π
Wan 2.2 (14B)
Alibaba
480pβ1080p
Up to 10s
~$0.10β$0.30/clip (API)
Yes (Apache 2.0)
Motion quality, VBench #1 benchmark
π
Mochi 1
Genmo
480p
Up to 5.4s @ 30fps
Free (open-source)
Yes (Apache 2.0)
High-quality open text-to-video
π
LTX Video
Lightricks
720p
Variable
Free (open-source)
Yes
Fast generation, ComfyUI-native
π
CogVideoX
Zhipu AI / Tsinghua
720p
~6s
Free (open-source)
Yes (Apache 2.0)
Image-to-video quality, LoRA fine-tuning
π
Text-to-speech (TTS) and speech-to-text (STT / ASR) models for voice generation, transcription, and real-time audio. Prices as of April 2026.
Model
Developer
Languages
Real-time
Open Source
Pricing
Best For
GitHub
ElevenLabs Turbo v2.5
ElevenLabs
29+
Yes
No
Free β $1,320/mo
Best quality (4.8 MOS), instant voice cloning
β
OpenAI TTS / TTS HD
OpenAI
57
Yes
No
$15 / $30 per 1M chars
Enterprise, seamless GPT integration
β
Sesame CSM
Sesame AI Labs
English
Yes
Yes
Free
Conversational, emotionally expressive (4.7 MOS)
π
Kokoro-82M
Hexgrad
Multilingual
Yes
Yes (Apache 2.0)
Free
Tiny (82M params), CPU-runnable, near-commercial quality
π
Fish Audio S1
Fish Audio
Multilingual
Yes
Yes
Free / $0.016/1K chars (API)
Voice cloning, multilingual fluency
π
Parler-TTS
HuggingFace
English
No
Yes (Apache 2.0)
Free
Style-controllable via text descriptions
π
XTTS v2
Coqui AI
17
Yes
Yes (MPL 2.0)
Free
Best open-source multilingual, 6s voice cloning
π
Bark
Suno AI
13+
No
Yes (MIT)
Free
Expressive, non-verbal sounds, long-form audio
π
Speech-to-Text (STT / ASR)
Model
Developer
Languages
Real-time
Open Source
Pricing
Best For
GitHub
Whisper large-v3
OpenAI
100+
No
Yes (MIT)
$0.006/min (API)
Open-source multilingual baseline
π
GPT-4o Transcribe
OpenAI
50+
Yes
No
$0.006/min
High-accuracy managed STT
β
Deepgram Nova-3
Deepgram
36+
Yes
No
$0.0043/min
Ultra-low latency, production STT
β
AssemblyAI Universal-2
AssemblyAI
Multilingual
Yes
No
$0.0025/min
Accurate, feature-rich transcription
β
AI Safety & Guardrails π‘οΈ
Tools and frameworks for detecting unsafe content, preventing prompt injection, validating outputs, and enforcing policy compliance in LLM-powered applications. As of April 2026.
Tool
Developer
Type
Open Source
Pricing
Best For
GitHub
Llama Guard 3
Meta
Safety classifier (8B LLM)
Yes (Meta license)
Free / ~$0.02/1M tokens (API)
Input/output safety classification, 8 languages
π
NeMo Guardrails
NVIDIA
Programmable guardrail toolkit (Colang DSL)
Yes (Apache 2.0)
Free
Dialog safety, policy enforcement, LangChain-native
π
Guardrails AI
Guardrails AI
Python validator framework
Yes
Free (OSS)
Output validation, PII detection, hallucination guards
π
Amazon Bedrock Guardrails
AWS
Managed safety layer
No
Pay-per-use (AWS)
AWS-native, zero-ops compliance and content filtering
β
ShieldGemma 2
Google
Safety classifier (open weights)
Yes (open weights)
Free
Text safety (2B/9B/27B), image safety (4B)
β
Rebuff
Protect AI
Prompt injection detector
Yes
Free
Self-hardening anti-injection using vector memory
π
Lakera Guard
Lakera
Managed LLM security API
No
Free tier + Enterprise
Runtime LLM security, <50ms latency, PII + injection
β
Frameworks and libraries for building Retrieval-Augmented Generation (RAG) pipelines β connecting LLMs to external knowledge sources. As of April 2026.
Framework
Developer
Language
Key Features
Open Source
GitHub
LlamaIndex
LlamaIndex
Python
160+ data connectors, hybrid search, multi-agent support
Yes (MIT)
π
LangChain
LangChain AI
Python / JS
Chains, agents, memory, 50K+ integrations, LangGraph
Yes (MIT)
π
RAGFlow
InfiniFlow
Python
Visual workflow builder, deep document parsing (PDF/tables)
Yes (Apache 2.0)
π
Haystack
deepset
Python
Modular pipelines, enterprise-grade, built-in monitoring
Yes (Apache 2.0)
π
Verba
Weaviate
Python
No-code UI, Weaviate-native vector search
Yes
π
Mem0
Mem0 AI
Python / JS
Persistent memory layer, graph memory, session recall
Yes (Apache 2.0)
π
txtai
NeuML
Python
All-in-one semantic search + workflow automation
Yes (Apache 2.0)
π
R2R
SciPhi
Python
Lightweight, low-latency, REST API, production-first
Yes (MIT)
π
Fine-tuning Platforms βοΈ
Tools and platforms for adapting pre-trained LLMs to specific tasks or domains via supervised fine-tuning, RLHF, LoRA/QLoRA, and related methods. Prices as of April 2026.
Platform
Type
Supported Models
Pricing
Best For
GitHub
Unsloth
OSS library
Llama, Mistral, Gemma, Qwen, Phi, + more
Free
2β5Γ faster training, 80% VRAM reduction via custom kernels
π
Axolotl
OSS framework
Most Hugging Face models
Free
Config-as-code (YAML), reproducibility, multi-GPU training
π
OpenAI Fine-tuning
Managed API
GPT-4o, GPT-4o-mini, GPT-3.5 Turbo
GPT-4o-mini: $0.30/1M training tokens
Managed, no infra, direct production deployment
β
Google Vertex AI
Managed cloud
Gemini 2.5 Pro/Flash, Gemma 3
Gemini 2.5 Pro: $25/1M training tokens
GCP-native, Gemini model access
β
Predibase / LoRAX
Cloud + OSS server
Llama, Mistral, 50+ HF models
Free tier + per-GPU pricing
Multi-adapter serving: many LoRA adapters on one GPU
π
PEFT
Hugging Face
All Hugging Face models
Free
LoRA, QLoRA, prefix tuning, prompt tuning β full HF ecosystem
π
LLaMA-Factory
Community
100+ models
Free
Web UI, low-code interface, beginner-friendly fine-tuning
π
torchtune
PyTorch
Llama, Gemma, Mistral, Phi
Free
PyTorch-native, composable training recipes
π
Evaluation & Observability π
Tools for tracing LLM calls, evaluating output quality, debugging RAG pipelines, and monitoring production AI systems. Prices as of April 2026.
Tool
Developer
Type
Open Source
Pricing
Best For
GitHub
LangSmith
LangChain AI
Tracing + evaluation platform
No (enterprise self-host)
Free (5K traces/mo), paid plans
LangChain apps, chain + agent debugging
β
Braintrust
Braintrust Data
Eval-first platform
Partial (AI proxy OSS)
Free (1M spans), enterprise
CI/CD evals, dataset management, LLM-as-judge
β
Helicone
Helicone
Proxy-based observability
Yes
Free tier, usage-based
Cost tracking, request caching, drop-in API proxy
π
Arize Phoenix
Arize AI
OSS tracing + evaluation
Yes
Free (OSS); Arize Cloud paid
RAG debugging, LLM-as-judge, local dev
π
Langfuse
Langfuse
Tracing + evaluation
Yes (MIT)
Free / self-host; cloud paid
Open-source, 19K+ GitHub stars, OpenTelemetry
π
Ragas
Ragas
RAG evaluation framework
Yes
Free
RAG-specific metrics: faithfulness, recall, precision
π
DeepEval
Confident AI
LLM evaluation framework
Yes
Free (OSS); cloud paid
14+ built-in metrics, pytest-style eval runner
π
The Model Context Protocol (MCP) is an open standard by Anthropic for connecting LLMs to external tools and data sources via a unified JSON-RPC 2.0 interface. It supports STDIO and Streamable HTTP transports. The official MCP registry at mcp.so lists 2,000+ servers.
MCP Clients: Claude Desktop, Claude Code, Cursor, Windsurf, VS Code (Copilot), Continue.dev, Zed, LibreChat, and more.
Tool / Server
Developer
Category
Open Source
Best For
GitHub
MCP Filesystem
Anthropic / Community
File I/O
Yes (MIT)
Read/write local files from any MCP client
π
MCP GitHub
GitHub / Anthropic
Code & DevOps
Yes
Repo management, issues, PRs, code search
π
MCP Slack
Community
Messaging
Yes
Slack workspace read/write interaction
π
MCP PostgreSQL
Community
Database
Yes
Read-only SQL queries against Postgres
π
MCP Google Drive
Community
Storage
Yes
Drive file access and search
π
MCP Docker
Community
DevOps
Yes
Container management and inspection
π
MCP Brave Search
Brave
Search
Yes
Web + local search via Brave API
π
MCP AWS
AWS Labs
Cloud
Yes (Apache 2.0)
AWS service integration
π
MCP Notion
Community
Productivity
Yes
Notion page and database access
π
FastMCP
Community
Framework
Yes
Python framework for building MCP servers fast
π
Context7
Upstash
Dev Tools
Yes
Up-to-date library docs for AI coding assistants
π
Model Routers & Load Balancers π
Tools for routing LLM requests across multiple providers, models, and deployments β optimizing for cost, latency, quality, or reliability. Prices as of April 2026.
Tool
Developer
Key Features
Open Source
Pricing
GitHub
LiteLLM
BerriAI
100+ provider support, proxy server, load balancing, fallbacks, spend tracking
Yes (MIT)
Free (OSS) / $99/mo cloud
π
Portkey
Portkey
250+ LLMs, AI gateway, guardrails, observability, virtual keys
Yes (Apache 2.0)
Free tier / $49/mo+
π
OpenRouter
OpenRouter
200+ model catalog, unified API, pay-per-use credit system
No
~5% markup on provider cost
β
RouteLLM
LMSys
Open-source router (strong vs. weak model) using classifier or matrix factorization
Yes
Free
π
Not Diamond
Not Diamond
Pre-trained + custom task-specific routers, cost/quality tradeoff
No
Free tier + enterprise
β
Unify AI
Unify
Quality / cost / latency-aware routing across 100+ model deployments
No
Usage-based
β
Semantic Router
Aurelio AI
Embedding-based semantic intent routing for agents and pipelines
Yes
Free
π
Small Language Models (SLMs) π±
Compact models designed for on-device inference, edge deployment, low-latency APIs, and resource-constrained environments. Generally defined as models under ~15B parameters. Specs as of April 2026.
Model
Developer
Params
Context
License
Best For
Phi-4
Microsoft
14B
16K
MIT
Reasoning, math, code β STEM benchmark leader at class size
Phi-4-mini
Microsoft
3.8B
128K
MIT
On-device STEM reasoning with long context
Phi-4-multimodal
Microsoft
5.6B
128K
MIT
Vision + audio + text multimodal, edge deployment
Gemma 3 27B
Google
27B
128K
Apache 2.0
Top open model, multilingual (140+ languages)
Gemma 3 4B
Google
4B
128K
Apache 2.0
CPU inference, 140+ languages, mobile-friendly
Gemma 3 1B
Google
1B
32K
Apache 2.0
On-device, embedded, ultra-lightweight
SmolLM3
Hugging Face
3B
128K
Apache 2.0
Efficient, tool use, multilingual, reasoning
Qwen2.5 3B
Alibaba
3B
128K
Apache 2.0
Asian and multilingual tasks, coding
Qwen2.5 7B
Alibaba
7B
128K
Apache 2.0
Strong multilingual baseline, function calling
Llama 3.2 3B
Meta
3B
128K
Llama 3.2 license
General-purpose, on-device, Meta ecosystem
Llama 3.2 1B
Meta
1B
128K
Llama 3.2 license
Lightweight edge inference, distillation target
Granite 3.3 8B
IBM
8B
128K
Apache 2.0
Enterprise tasks, tool use, business-domain
MiniCPM 3.0
ModelBest / Tsinghua
4B
32K
Apache 2.0
Compact yet capable, mobile and edge
Danube 3 500M
H2O.ai
500M
8K
Apache 2.0
Ultra-lightweight on-device, IoT
Notable GitHub repos:
Tutorials, how-tos, and in-depth guides for getting the most out of AI models and tools.
A beginner-friendly introduction to AI models and how to start using them effectively.
Concept
Description
Parameters
Size of model (B = billions). More = more capable
Context Window
How much text model can process (128K standard)
Tokens
Basic units of text (~0.75 words per token)
Method
Best For
Setup Difficulty
Web Interfaces
Quick experiments
Easiest
API Access
Building applications
Easy
Self-Hosting
Privacy, no API costs
Medium-Hard
IDE Integration
Daily coding
Easy
Model Recommendations by Task
Task
Free Option
Premium Option
Chat
Llama 4 (self-hosted)
GPT-5.4, Claude Opus 4.6
Coding
DeepSeek-Coder-V2
Claude Opus 4.6
Reasoning
DeepSeek-R1
Gemini 3 Deep Think, o3
Long docs
Llama 4 Scout
Gemini 3 Flash
Vision
Llama 4 Maverick
GPT-5.4, Gemini 3 Pro
Model Selection Guide π―
A comprehensive guide to choosing the right AI model for your specific needs.
Need
π Free / Self-Host
π Best Quality
β‘ Fast / Autonomous
π» Coding
DeepSeek-Coder-V2
Claude Opus 4.6
GPT-5.3-Codex
π§ Reasoning / Math
DeepSeek-R1
Gemini 3 Deep Think
o3
π¬ General Chat
Llama 4 (self-hosted)
GPT-5.4, Claude Opus 4.6
Gemini 3 Flash
π¨ Vision
Llama 4 Maverick
GPT-5.4, Gemini 3.1 Pro
Gemini 3 Flash
π₯οΈ Self-Hosting
Phi-4
DeepSeek-V4
vLLM / SGLang (serving)
Budget
Options
Free
Self-hosted (Llama 4, Qwen3, Mistral)
$0-10/mo
API entry tiers, Gemini Flash
$10-50/mo
Copilot, Claude API, GPT-5 API
$50+/mo
Heavy usage, multiple models
Self-Hosting Guide π₯οΈ
A comprehensive guide to running AI models on your own hardware.
Benefit
Description
Privacy
Data never leaves your infrastructure
Cost Control
No per-token API costs for unlimited usage
Customization
Fine-tune models for specific needs
No Rate Limits
Process as much as hardware allows
Offline Access
Work without internet
For installation and usage instructions, refer to the official Ollama documentation .
Recommended apps (local-first):
Ollama - Simple local runtime with a local HTTP API
LM Studio - Desktop UI for downloading and running models locally
llama.cpp - Fast local inference (CPU/GPU), great for quantized models
Open WebUI - Optional local web UI (pairs well with local runtimes)
If you want βserver-styleβ hosting (advanced):
vLLM - High-throughput serving for NVIDIA GPUs
SGLang - Structured generation and serving workflows
Practical setup tips:
Install the latest NVIDIA drivers (enable GPU acceleration in your chosen app)
Start with smaller quantized models (Q4 is a common βbest defaultβ)
Keep context windows realistic for local hardware (lower context = faster, less memory)
Watch VRAM first, then system RAM; reduce model size or quantization if either saturates
Prefer running locally on localhost and only expose to LAN if you understand firewall rules
Example hardware configurations:
Hardware
Good starting point
Notes
Consumer GPU (24 GB VRAM)
7Bβ14B quantized
e.g., RTX 4090, RTX 3090 β great for chat/coding
Pro GPU (48β80 GB VRAM)
14Bβ70B quantized
e.g., A6000, A100 β coding agents, longer contexts
Multi-GPU (160+ GB VRAM)
70B+ quantized
e.g., 2ΓA100 β larger open-source models
CPU-only (32β64 GB RAM)
7Bβ14B quantized
Slower but viable for offline chat; keep context moderate
Option
Best For
Pros
Cons
Local Machine
Personal use
Simple, no latency
Limited hardware
Dedicated Server
Team use
Full control
Maintenance
Cloud GPU Rental
Experimentation
On-demand
Hourly costs
Kubernetes
Enterprise
Scalable
Complex
Comprehensive pricing comparisons and cost calculations.
Tier
Price Range
Models
π Free
$0
Self-hosted, free tiers
π΅ Budget
$0.07 - $0.50/1M
GLM-4.7-FlashX, GLM-4-32B-0414-128K, Yi-Lightning, GPT-5.4 nano, Gemini 3.1 Flash-Lite, DeepSeek-V3.1, MiniMax-M2.5
π° Mid-range
$0.60 - $15.00/1M
GPT-5.4 mini, Claude Haiku 4.5, Kimi K2.5, Sonar, GLM-5, GPT-5.4, Claude Sonnet
π Premium
$15.00 - $600.00/1M
GPT-5.4 Pro, Claude Opus, o1-Pro
Subscription Pricing (Monthly, USD)
AI chat apps
Product
Plans (USD)
Notes
Official Source
ChatGPT
Go $8 , Plus $20 , Pro $200 , Business $25/seat (annual) or $30/seat (monthly), Enterprise (contact sales)
Consumer prices are US-listed; Go is localized in some markets
π
Claude
Pro $20 , Max $100 (5Γ) or $200 (20Γ), Team/Enterprise (see pricing)
Prices shown exclude applicable taxes; availability varies by region
π
Google AI (Gemini)
Plus $7.99 , Pro $19.99 , Ultra $249.99
US pricing; some regions/local pricing differ
π
Coding assistants
Tool
Plans (USD)
Notes
Official Source
GitHub Copilot
Free $0 , Pro $10 , Pro+ $39 , Business $19/user , Enterprise $39/user
Annual options available for Pro/Pro+
π
Model
Input
Output
Cached Input
Best For
GLM-4.7-FlashX
$0.07
$0.40
β
Fast budget tasks
Step-3.5-Flash
$0.10
$0.30
β
Ultra-fast reasoning (85β350 tok/s)
GLM-4-32B-0414-128K
$0.10
$0.10
β
Budget chat/coding
Llama 4 Maverick
$0.15
$0.60
β
Open multimodal (self-host: $0)
GPT-5.4 nano
$0.20
$1.25
$0.02
Classification and lightweight subagents
Grok 4 Fast
$0.20
$0.50
$0.05
Fast Grok reasoning
Gemini 3.1 Flash-Lite
$0.25
$1.50
Supported
High-volume multimodal tasks
DeepSeek-V3.1
$0.27
$0.41
β
Everything
DeepSeek-V3.2
$0.28
$0.42
$0.028
Budget workhorse, reasoning
DeepSeek-V4
$0.30
$0.50
$0.03
Engram memory, coding (off-peak 50% off)
Gemini 3 Flash
$0.30
$2.50
$0.05 + $1/hr
Long context
MiniMax-M2.5
$0.30
$1.20
Auto (included)
Coding, long context
Mistral Large 3
$0.50
$1.50
$0.05
Open-weight 675B MoE
Kimi K2.5
$0.60
$3.00
Auto (included)
Multimodal + agent tasks
GPT-5.4 mini
$0.75
$4.50
$0.075
Fast coding and multimodal tasks
Claude Haiku 4.5
$1.00
$5.00
β
Low-latency coding and sub-agents
GLM-5
$1.00
$3.20
$0.20
Agentic engineering
Perplexity Sonar
$1.00
$1.00
β
Web-grounded chat (request fees apply)
GPT-5.3-Codex
$1.75
$14.00
$0.175
Agentic coding, 7+ hour autonomy
Gemini 3.1 Pro
$2.00
$12.00
$0.20β$0.40 + $4.50/hr
Frontier reasoning
Perplexity Sonar Reasoning Pro
$2.00
$8.00
β
Reasoning + search (request fees apply)
GPT-5.4
$2.50
$15.00
$0.25
Frontier coding and professional work
Grok 4
$3.00
$15.00
$0.75
First-principles reasoning
Perplexity Sonar Pro
$3.00
$15.00
β
Higher quality + search (request fees apply)
Claude Sonnet 4.5
$3.00
$15.00
$0.30 (hit)
Best coding
Claude Sonnet 4.6
$3.00
$15.00
$0.30 (hit)
Near-Opus performance
Claude Opus 4.6
$5.00
$25.00
$0.50 (hit)
Agentic coding
Self-Hosting vs API (Monthly)
Usage Level
Self-Host (A100)
API (GPT-5)
Winner
Light (1M tokens)
$300 (rental)
$10
API
Medium (100M tokens)
$300
$1,000
Self-host
Heavy (1B tokens)
$300
$10,000
Self-host
Enterprise (10B+ tokens)
$2,000 (owned)
$100,000+
Self-host
Reference materials including glossary, comparison tables, and data sources.
Definitions of common terms used throughout the documentation.
Term
Definition
Agent
AI system that autonomously performs tasks and interacts with environments
API
Interface for programmatically accessing AI models
Attention Mechanism
Neural network component focusing on relevant input parts
Benchmark
Standardized test measuring model performance
Chain-of-Thought (CoT)
Prompting technique showing step-by-step reasoning
Term
Definition
Fine-Tuning
Adapting pre-trained model to specific tasks
Frontier Model
State-of-the-art proprietary model
GPU
Hardware accelerator essential for ML
LLM
Large Language Model
LoRA
Efficient fine-tuning method
Term
Definition
MCP
Model Context Protocol for tool interaction
MMLU
Massive Multitask Language Understanding benchmark
MoE
Mixture of Experts architecture
Multimodal
Processing multiple input types
RAG
Retrieval-Augmented Generation
Term
Definition
Self-Hosting
Running models on own infrastructure
SLM
Small Language Model
SWE-bench
Benchmark for real GitHub issue resolution
Token
Basic unit of text processing
VRAM
GPU memory for model storage
Side-by-side comparisons of AI models sorted by various criteria.
Sort by Latest Update (Default)
π’ Company
π€ Model
π¦ Version
π
Release Date
π Latest Updated
π» Coding
π Benchmarks
π° Price
π₯οΈ Self-Host
π Official Site
π€ OpenAI
GPT-5
5.4 mini
2026-03-17 00:00 UTC
2026-03-17 00:00 UTC β
β
GPQA 87.5%
$0.75 / $4.50
β
π
π€ OpenAI
GPT-5
5.4
2026-03-05 00:00 UTC
2026-03-05 00:00 UTC β
β
GPQA 92.0%, SWE-bench ~80%
$2.50 / $15.00
β
π
π Google DeepMind
Gemini 3.1
Flash-Lite
2026-03-03 00:00 UTC
2026-03-03 00:00 UTC β
β
β
$0.25 / $1.50
β
π
π¬ DeepSeek
DeepSeek
V4
2026-02-17 00:00 UTC
2026-02-17 00:00 UTC
β
No public benchmarks
Pay-per-token
β
π
π Google DeepMind
Gemini 3
Deep Think
2026-02-12 00:00 UTC
2026-02-12 00:00 UTC β
β
GPQA ~97%, ARC-AGI-2 84.6%, HLE 48.4%
Ultra subscription
β
π
π¨π³ Zhipu AI
GLM
5
2026-02-12 00:00 UTC
2026-02-12 00:00 UTC β
β
GPQA 82.0%, SWE-bench 77.8%
$1.00 / $3.20
β
π
π€ Anthropic
Claude
Opus 4.6
2026-02-05 00:00 UTC
2026-02-05 00:00 UTC β
β
GPQA 91.3%, SWE-bench 80.8%
$5 / $25
β
π
π€ OpenAI
GPT-5
5.3-Codex
2026-02-05 00:00 UTC
2026-02-05 00:00 UTC β
β
GPQA 91.5%, SWE-bench Pro 56.8%
TBD
β
π
π Moonshot AI
Kimi
K2.5
2026-01-29 00:00 UTC
2026-02-02 00:00 UTC β
β
GPQA 87.6%, SWE-bench 76.8%
$0.60 / $3.00
β
π
Release Windows (Month-level)
π’ Company
π€ Model
π
Release Window
Notes
π Official Site
π§ MiniMax
MiniMax M2.5
2026-02
$0.30 / $1.20
π
π¨π³ Alibaba/Qwen
Qwen 3.5-Max
2026-02
Open-source release window
π
π Google DeepMind
Gemini 3.1 Flash-Lite
2026-03
$0.25 / $1.50
π
π Google DeepMind
Gemini 3 Pro
2026-01
Tiered pricing
π
π€ OpenAI
GPT-5.4 family
2026-03
GPT-5.4, GPT-5.4 mini, GPT-5.4 nano
π
π» Mistral AI
Mistral Large 3
2026-01
Open-weight
π
Rank
Model
Input
Output
License
1
Self-hosted
$0
$0
Various
2
GLM-4.7-Flash
$0
$0
Free
3
GLM-4.7-FlashX
$0.07
$0.40
API
4
GLM-4-32B-0414-128K
$0.10
$0.10
API
5
Yi-Lightning
$0.14
$0.42
Apache 2.0
6
GPT-5.4 nano
$0.20
$1.25
Proprietary
7
Gemini 3.1 Flash-Lite
$0.25
$1.50
Proprietary
8
DeepSeek-V3.1
$0.27
$0.41
MIT
9
Gemini 3 Flash
$0.30
$2.50
Proprietary
10
MiniMax-M2.5
$0.30
$1.20
Proprietary
Sort by Performance (Coding)
Rank
Model
HumanEval
Self-Host
1
Claude Sonnet 4.5
~92%
β
2
GPT-OSS-120B
~89%
β
3
DeepSeek-Coder-V2
~92%
β
4
Qwen3-Coder
~92%
β
5
DeepSeek-V3.1
82%+
β
Rank
Model
Context
Best For
1
Gemini 3 Flash
10M
Entire libraries
2
Llama 4 Scout
10M
Long-document RAG
3
Gemini 3 Pro
1M+
Research papers
4
Kimi K2.5
256K
Large codebases
Attribution, verification sources, and methodology.
Benchmark
Source
Description
GPQA Diamond
Google Research
Graduate-level science questions (PhD difficulty)
MMLU-Pro
TIGER-Lab
Extended multi-task language understanding
Arena Elo
lmarena.ai
Crowdsourced human preference ranking
HLE
Scale AI
Humanity's Last Exam β expert-level questions
SWE-bench Verified
Princeton
Real GitHub issue resolution (human-verified)
SWE-bench Pro
Princeton
More challenging subset of SWE-bench
LiveCodeBench
LiveCodeBench
Live competitive programming problems
AIME 2025
MAA
American Invitational Mathematics Examination
ARC-AGI-2
ARC Prize
Abstract reasoning challenge (fluid intelligence)
MMMU / MMMU-Pro
MMMU
Multi-discipline multimodal understanding
IFEval
Google Research
Instruction-following evaluation
FrontierMath
Epoch AI
Expert-level research mathematics
HumanEval
OpenAI
164 Python programming problems
Primary Source Review - Check official documentation
Cross-Validation - Compare multiple sources
Timestamp Verification - All data includes verification date
Update Tracking - Monitor official channels
Last Updated: 2026-04-02 04:58 UTC
Maintained by: ReadyPixels LLC
Made with β€οΈ by ReadyPixels LLC