Claude Opus 4.1 (August 2025)

Performance overview across all HAL benchmarks

Benchmarks

Agents

Pareto Optimal Benchmarks

Token Pricing

$15

Input Tokens

per 1M tokens

$75

Output Tokens

per 1M tokens

Benchmark Performance

On the Pareto Frontier? indicates whether this model achieved a Pareto-optimal trade-off between accuracy and cost on that benchmark. Models on the Pareto frontier represent the current state-of-the-art efficiency for their performance level.

Benchmark	Agent	Accuracy	Cost	On the Pareto Frontier?
Assistantbench	Browser-Use	7.26%	$385.43	No
Corebench Hard	CORE-Agent	51.11%	$412.42	No
Corebench Hard	HAL Generalist Agent	35.56%	$375.11	No
Gaia	HAL Generalist Agent	64.24%	$641.86	No
Gaia	HF Open Deep Research	28.48%	$1306.85	No
Scicode	Scicode Tool Calling Agent	7.69%	$625.13	No
Scienceagentbench	SAB Self-Debug	27.45%	$33.37	No
Swebench Verified Mini	SWE-Agent	61.00%	$1351.35	No
Swebench Verified Mini	HAL Generalist Agent	42.00%	$477.65	No
Taubench Airline	HAL Generalist Agent	54.00%	$180.49	No
Taubench Airline	TAU-bench Tool Calling	50.00%	$69.78	No
Usaco	USACO Episodic + Semantic	48.21%	$276.19	No