Claude Sonnet 4.5 High (September 2025)

Performance overview across all HAL benchmarks

Benchmarks

Agents

Pareto Optimal Benchmarks

Token Pricing

Input Tokens

per 1M tokens

$15

Output Tokens

per 1M tokens

Benchmark Performance

On the Pareto Frontier? indicates whether this model achieved a Pareto-optimal trade-off between accuracy and cost on that benchmark. Models on the Pareto frontier represent the current state-of-the-art efficiency for their performance level.

Benchmark	Agent	Accuracy	Cost	On the Pareto Frontier?
Assistantbench	Browser-Use	11.80%	$99.23	No
Corebench Hard	CORE-Agent	44.44%	$92.34	No
Corebench Hard	HAL Generalist Agent	28.89%	$87.77	No
Gaia	HAL Generalist Agent	70.91%	$179.86	No
Gaia	HF Open Deep Research	30.91%	$535.00	No
Scicode	Scicode Tool Calling Agent	1.54%	$118.14	No
Scienceagentbench	SAB Self-Debug	30.39%	$7.47	Yes
Swebench Verified Mini	SWE-Agent	72.00%	$463.90	Yes
Swebench Verified Mini	HAL Generalist Agent	40.00%	$95.97	No