Claude Opus 4.1 High (August 2025)

Performance overview across all HAL benchmarks

Benchmarks

Agents

Pareto Optimal Benchmarks

Token Pricing

$15

Input Tokens

per 1M tokens

$75

Output Tokens

per 1M tokens

Benchmark Performance

On the Pareto Frontier? indicates whether this model achieved a Pareto-optimal trade-off between accuracy and cost on that benchmark. Models on the Pareto frontier represent the current state-of-the-art efficiency for their performance level.

Benchmark	Agent	Accuracy	Cost	On the Pareto Frontier?
Assistantbench	Browser-Use	13.75%	$779.72	No
Corebench Hard	CORE-Agent	42.22%	$509.95	No
Corebench Hard	HAL Generalist Agent	33.33%	$358.47	No
Gaia	HAL Generalist Agent	68.48%	$562.24	No
Gaia	HF Open Deep Research	25.45%	$1473.64	No
Scicode	Scicode Tool Calling Agent	6.92%	$550.54	No
Scienceagentbench	SAB Self-Debug	26.47%	$33.75	No
Swebench Verified Mini	SWE-Agent	54.00%	$1599.90	No
Swebench Verified Mini	HAL Generalist Agent	46.00%	$399.93	No
Taubench Airline	TAU-bench Tool Calling	52.00%	$149.98	No
Taubench Airline	HAL Generalist Agent	32.00%	$140.28	No
Usaco	USACO Episodic + Semantic	51.47%	$267.72	No