Claude Sonnet 4.5 (September 2025)
Performance overview across all HAL benchmarks
6
Benchmarks
7
Agents
1
Pareto Optimal Benchmarks
Token Pricing
$3
Input Tokens
per 1M tokens
$15
Output Tokens
per 1M tokens
Benchmark Performance
On the Pareto Frontier? indicates whether this model achieved a Pareto-optimal trade-off between accuracy and cost on that benchmark. Models on the Pareto frontier represent the current state-of-the-art efficiency for their performance level.
| Benchmark | Agent | Accuracy | Cost | On the Pareto Frontier? |
|---|---|---|---|---|
|
Assistantbench
|
Browser-Use | 7.09% | $75.60 | No |
|
Corebench Hard
|
CORE-Agent | 37.78% | $97.15 | No |
|
Corebench Hard
|
HAL Generalist Agent | 33.33% | $85.19 | No |
|
Gaia
|
HAL Generalist Agent | 74.55% | $178.20 | Yes |
|
Gaia
|
HF Open Deep Research | 30.91% | $451.84 | No |
|
Scicode
|
Scicode Tool Calling Agent | 3.08% | $104.29 | No |
|
Scienceagentbench
|
SAB Self-Debug | 29.41% | $7.39 | No |
|
Swebench Verified Mini
|
SWE-Agent | 68.00% | $505.92 | No |
|
Swebench Verified Mini
|
HAL Generalist Agent | 34.00% | $128.19 | No |