Live benchmark data — Apr 2026

Compare AI Models

Real benchmark scores from LMSYS Arena, GPQA Diamond, AIME 2025, SWE-bench, MMMLU, and Humanity's Last Exam. No marketing benchmarks.

A
#1 Overall — Overall Score
Claude Opus 4.7
Anthropic1M context • $5 / $25 per 1M
90.2
Overall score
11 models
A
Claude Opus 4.7
Anthropic
90.2
#1
#1 Coding#1 GPQA1M Context
1M$5 / $25 per 1M
O
GPT-5.5
OpenAI
89
#2
#1 Arena EloARC-AGI 85%Vision
256K$5 / $30 per 1M
G
Gemini 3.1 Pro
Google
88.9
#3
#1 AIME#1 HLE#1 MMMLU
10M$2 / $12 per 1M
K
Kimi K2.5 Thinking
Kimi
84
#4
Agent Swarm100 AgentsFast
256K$0.6 / $2.5 per 1M
X
Grok 4.20
xAI
80.3
#5
Real-time XUnfilteredFast
2M$3 / $15 per 1M
D
DeepSeek R1
DeepSeek
80
#6
MIT License20x CheaperReasoning
128K$0.55 / $2.19 per 1M
M
Llama 4 Scout
Meta
66.2
#7
Open Weights10M ContextMoE 109B
10MFree (self-host)
M
Mistral Large 3
Mistral
64
#8
EU HostedMultilingualEnterprise
128K~$2 / ~$6 per 1M
C
Command R+
Cohere
41.8
#9
RAG StackRerank v3Enterprise
128K$2.5 / $10 per 1M
P
Sonar Pro
Perplexity
37.6
#10
Real-time SearchCitationsAnswer Engine
128K$20/mo Pro
A
Jamba 1.6
AI21
29.6
#11
SSM HybridLong ContextWordtune
256KAPI / Custom