Claude Opus 4 (May 2025)

Performance overview across all HAL benchmarks

3
Benchmarks
4
Agents
0
Pareto Optimal Benchmarks

Token Pricing

$15
Input Tokens
per 1M tokens
$75
Output Tokens
per 1M tokens

Benchmark Performance

On the Pareto Frontier? indicates whether this model achieved a Pareto-optimal trade-off between accuracy and cost on that benchmark. Models on the Pareto frontier represent the current state-of-the-art efficiency for their performance level.

Benchmark Agent Accuracy Cost On the Pareto Frontier?
Gaia
HF Open Deep Research 57.58% $1686.07 No
Gaia
HAL Generalist Agent 30.30% $272.76 No
Swebench Verified Mini
SWE-Agent 50.00% $1330.90 No
Swebench Verified Mini
HAL Generalist Agent 34.00% $382.39 No
Taubench Airline
TAU-bench Few Shot 56.00% $363.30 No
Taubench Airline
HAL Generalist Agent 44.00% $150.15 No