DeepSeek V3

Performance overview across all HAL benchmarks

Benchmarks

Agents

Pareto Benchmarks

Token Pricing

$0.2

Input Tokens

per 1M tokens

$0.8

Output Tokens

per 1M tokens

Benchmark Performance

On the Pareto Frontier? indicates whether this model achieved a Pareto-optimal trade-off between accuracy and cost on that benchmark. Models on the Pareto frontier represent the current state-of-the-art efficiency for their performance level.

Benchmark	Agent	Accuracy	Cost	On the Pareto Frontier?
Assistantbench	Browser-Use	2.03%	$12.66	No
Corebench Hard	CORE-Agent	17.78%	$25.26	No
Corebench Hard	HAL Generalist Agent	8.89%	$0.76	Yes
Gaia	HAL Generalist Agent	36.36%	$29.27	No
Gaia	HF Open Deep Research	28.48%	$76.64	No
Online Mind2Web	Browser-Use	32.33%	$214.74	No
Scicode	Scicode Zero Shot Agent	3.08%	$0.79	No
Scicode	Scicode Tool Calling Agent	0.00%	$52.11	No
Scienceagentbench	SAB Self-Debug	15.69%	$2.09	No
Scienceagentbench	HAL Generalist Agent	0.98%	$55.73	No
Swebench Verified Mini	SWE-Agent	24.00%	$11.77	No
Swebench Verified Mini	HAL Generalist Agent	10.00%	$30.17	No
Taubench Airline	TAU-bench Few Shot	34.00%	$30.60	No
Taubench Airline	HAL Generalist Agent	18.00%	$10.73	No
Usaco	USACO Episodic + Semantic	39.09%	$12.08	Yes