Claude-3.7 Sonnet High (February 2025)

Performance overview across all HAL benchmarks

Benchmarks

Agents

Pareto Optimal Benchmarks

Token Pricing

Input Tokens

per 1M tokens

$15

Output Tokens

per 1M tokens

Benchmark Performance

On the Pareto Frontier? indicates whether this model achieved a Pareto-optimal trade-off between accuracy and cost on that benchmark. Models on the Pareto frontier represent the current state-of-the-art efficiency for their performance level.

Benchmark	Agent	Accuracy	Cost	On the Pareto Frontier?
Assistantbench	Browser-Use	13.08%	$16.13	No
Corebench Hard	HAL Generalist Agent	37.78%	$66.15	No
Corebench Hard	CORE-Agent	24.44%	$72.47	No
Gaia	HAL Generalist Agent	64.24%	$122.49	No
Gaia	HF Open Deep Research	35.76%	$113.65	No
Online Mind2Web	Browser-Use	39.33%	$1151.88	No
Online Mind2Web	SeeAct	30.33%	$367.51	No
Scicode	Scicode Tool Calling Agent	4.62%	$204.37	No
Scicode	Scicode Zero Shot Agent	3.08%	$4.99	No
Scicode	HAL Generalist Agent	3.08%	$188.15	No
Scienceagentbench	SAB Self-Debug	30.39%	$11.74	No
Scienceagentbench	HAL Generalist Agent	17.65%	$48.28	No
Swebench Verified Mini	SWE-Agent	54.00%	$388.88	No
Swebench Verified Mini	HAL Generalist Agent	24.00%	$72.98	No
Taubench Airline	TAU-bench Tool Calling	52.00%	$31.94	No
Taubench Airline	HAL Generalist Agent	44.00%	$34.58	No
Usaco	USACO Episodic + Semantic	26.71%	$56.43	No