DeepSeek R1 (January 2025)

Performance overview across all HAL benchmarks

7
Benchmarks
7
Agents
0
Pareto Optimal Benchmarks

Token Pricing

$3
Input Tokens
per 1M tokens
$7
Output Tokens
per 1M tokens

Benchmark Performance

On the Pareto Frontier? indicates whether this model achieved a Pareto-optimal trade-off between accuracy and cost on that benchmark. Models on the Pareto frontier represent the current state-of-the-art efficiency for their performance level.

Benchmark Agent Accuracy Cost On the Pareto Frontier?
Corebench Hard
CORE-Agent 6.67% $81.11 No
Corebench Hard
HAL Generalist Agent 2.22% $13.87 No
Gaia
HAL Generalist Agent 30.30% $73.19 No
Gaia
HF Open Deep Research 24.85% $143.08 No
Online Mind2Web
Browser-Use 25.33% $280.93 No
Scienceagentbench
SAB Self-Debug 23.53% $18.24 No
Swebench Verified Mini
HAL Generalist Agent 6.00% $146.71 No
Swebench Verified Mini
SWE-Agent 0.00% $4.16 No
Taubench Airline
HAL Generalist Agent 10.00% $30.18 No
Usaco
USACO Episodic + Semantic 38.11% $80.04 No