Claude Sonnet 4.5 (September 2025)

Performance overview across all HAL benchmarks

4
Benchmarks
5
Agents
1
Pareto Optimal Benchmarks

Token Pricing

$3
Input Tokens
per 1M tokens
$15
Output Tokens
per 1M tokens

Benchmark Performance

On the Pareto Frontier? indicates whether this model achieved a Pareto-optimal trade-off between accuracy and cost on that benchmark. Models on the Pareto frontier represent the current state-of-the-art efficiency for their performance level.

Benchmark Agent Accuracy Cost On the Pareto Frontier?
Assistantbench
Browser-Use 7.09% $75.60 No
Corebench Hard
CORE-Agent 37.78% $97.15 No
Corebench Hard
HAL Generalist Agent 33.33% $85.19 No
Gaia
HAL Generalist Agent 74.55% $187.37 Yes
Gaia
HF Open Deep Research 30.91% $451.84 No
Scicode
Scicode Tool Calling Agent 3.08% $104.29 No