Claude Opus 4 (May 2025)
Performance overview across all HAL benchmarks
3
Benchmarks
4
Agents
0
Pareto Optimal Benchmarks
Token Pricing
$15
Input Tokens
per 1M tokens
$75
Output Tokens
per 1M tokens
Benchmark Performance
On the Pareto Frontier? indicates whether this model achieved a Pareto-optimal trade-off between accuracy and cost on that benchmark. Models on the Pareto frontier represent the current state-of-the-art efficiency for their performance level.
Benchmark | Agent | Accuracy | Cost | On the Pareto Frontier? |
---|---|---|---|---|
Gaia
|
HF Open Deep Research | 57.58% | $1686.07 | No |
Gaia
|
HAL Generalist Agent | 30.30% | $272.76 | No |
Swebench Verified Mini
|
SWE-Agent | 50.00% | $1330.90 | No |
Swebench Verified Mini
|
HAL Generalist Agent | 34.00% | $382.39 | No |
Taubench Airline
|
TAU-bench Few Shot | 56.00% | $363.30 | No |
Taubench Airline
|
HAL Generalist Agent | 44.00% | $150.15 | No |