DeepSeek R1 (May 2025)
Performance overview across all HAL benchmarks
3
Benchmarks
4
Agents
0
Pareto Optimal Benchmarks
Token Pricing
$0.55
Input Tokens
per 1M tokens
$2.19
Output Tokens
per 1M tokens
Benchmark Performance
On the Pareto Frontier? indicates whether this model achieved a Pareto-optimal trade-off between accuracy and cost on that benchmark. Models on the Pareto frontier represent the current state-of-the-art efficiency for their performance level.
Benchmark | Agent | Accuracy | Cost | On the Pareto Frontier? |
---|---|---|---|---|
Assistantbench
|
Browser-Use | 8.75% | $18.18 | No |
Corebench Hard
|
HAL Generalist Agent | 8.89% | $7.77 | No |
Scicode
|
Scicode Zero Shot Agent | 0.00% | $2.19 | No |
Scicode
|
Scicode Tool Calling Agent | 0.00% | $57.62 | No |