Claude Sonnet 4.5 (September 2025)

Performance overview across all HAL benchmarks

Benchmarks

Agents

Pareto Optimal Benchmarks

Token Pricing

Input Tokens

per 1M tokens

$15

Output Tokens

per 1M tokens

Benchmark Performance

On the Pareto Frontier? indicates whether this model achieved a Pareto-optimal trade-off between accuracy and cost on that benchmark. Models on the Pareto frontier represent the current state-of-the-art efficiency for their performance level.

Benchmark	Agent	Accuracy	Cost	On the Pareto Frontier?
Assistantbench	Browser-Use	7.09%	$75.60	No
Corebench Hard	Claude Code	62.22%	$68.33	No
Corebench Hard	CORE-Agent	37.78%	$97.15	No
Corebench Hard	HAL Generalist Agent	33.33%	$85.19	No
Gaia	HAL Generalist Agent	74.55%	$178.20	Yes
Gaia	HF Open Deep Research	30.91%	$451.84	No
Scicode	Scicode Tool Calling Agent	3.08%	$104.29	No
Scienceagentbench	SAB Self-Debug	29.41%	$7.39	No
Swebench Verified Mini	SWE-Agent	68.00%	$505.92	No
Swebench Verified Mini	HAL Generalist Agent	34.00%	$128.19	No