DeepSeek R1 (January 2025)

Performance overview across all HAL benchmarks

Benchmarks

Agents

Pareto Optimal Benchmarks

Token Pricing

Input Tokens

per 1M tokens

Output Tokens

per 1M tokens

Benchmark Performance

On the Pareto Frontier? indicates whether this model achieved a Pareto-optimal trade-off between accuracy and cost on that benchmark. Models on the Pareto frontier represent the current state-of-the-art efficiency for their performance level.

Benchmark	Agent	Accuracy	Cost	On the Pareto Frontier?
Assistantbench	Browser-Use	0.00%	$5.30	No
Corebench Hard	CORE-Agent	6.67%	$81.11	No
Corebench Hard	HAL Generalist Agent	4.45%	$24.95	No
Gaia	HAL Generalist Agent	30.30%	$73.19	No
Gaia	HF Open Deep Research	24.85%	$143.08	No
Online Mind2Web	Browser-Use	25.33%	$280.93	No
Scicode	HAL Generalist Agent	0.00%	$486.78	No
Scienceagentbench	SAB Self-Debug	23.53%	$18.24	No
Swebench Verified Mini	HAL Generalist Agent	6.00%	$146.71	No
Swebench Verified Mini	SWE-Agent	0.00%	$4.16	No
Taubench Airline	TAU-bench Tool Calling	36.00%	$13.30	No
Taubench Airline	HAL Generalist Agent	10.00%	$30.18	No
Usaco	USACO Episodic + Semantic	38.11%	$80.04	No