DeepSeek R1

Performance overview across all HAL benchmarks

Benchmarks

Agents

Pareto Optimal Benchmarks

Token Pricing

$0.2

Input Tokens

per 1M tokens

$0.8

Output Tokens

per 1M tokens

Benchmark Performance

On the Pareto Frontier? indicates whether this model achieved a Pareto-optimal trade-off between accuracy and cost on that benchmark. Models on the Pareto frontier represent the current state-of-the-art efficiency for their performance level.

Benchmark	Agent	Accuracy	Cost	On the Pareto Frontier?
Assistantbench	Browser-Use	8.75%	$6.62	No
Corebench Hard	CORE-Agent	8.89%	$9.05	No
Corebench Hard	HAL Generalist Agent	8.89%	$2.83	No
Gaia	HAL Generalist Agent	30.30%	$5.47	No
Gaia	HF Open Deep Research	24.85%	$11.10	No
Online Mind2Web	Browser-Use	25.33%	$21.26	No
Scicode	Scicode Tool Calling Agent	0.00%	$20.98	No
Scicode	Scicode Zero Shot Agent	0.00%	$0.80	No
Scienceagentbench	SAB Self-Debug	23.53%	$1.87	Yes
Swebench Verified Mini	HAL Generalist Agent	6.00%	$10.32	No
Swebench Verified Mini	SWE-Agent	0.00%	$0.41	Yes
Taubench Airline	TAU-bench Few Shot	36.00%	$5.66	No
Taubench Airline	HAL Generalist Agent	10.00%	$2.91	No
Usaco	USACO Episodic + Semantic	38.11%	$8.18	No