Claude Opus 4.1 (August 2025)

Performance overview across all HAL benchmarks

8
Benchmarks
9
Agents
1
Pareto Optimal Benchmarks

Token Pricing

$15
Input Tokens
per 1M tokens
$75
Output Tokens
per 1M tokens

Benchmark Performance

On the Pareto Frontier? indicates whether this model achieved a Pareto-optimal trade-off between accuracy and cost on that benchmark. Models on the Pareto frontier represent the current state-of-the-art efficiency for their performance level.

Benchmark Agent Accuracy Cost On the Pareto Frontier?
Assistantbench
Browser-Use 7.26% $385.43 No
Corebench Hard
CORE-Agent 51.11% $412.42 Yes
Corebench Hard
HAL Generalist Agent 35.56% $375.11 No
Gaia
HF Open Deep Research 28.48% $1306.85 No
Scicode
Scicode Tool Calling Agent 7.69% $625.13 No
Scienceagentbench
SAB Self-Debug 27.45% $33.37 No
Swebench Verified Mini
SWE-Agent 54.00% $1789.67 No
Swebench Verified Mini
HAL Generalist Agent 42.00% $477.65 No
Taubench Airline
TAU-bench Few Shot 54.00% $294.17 No
Taubench Airline
HAL Generalist Agent 54.00% $180.49 No
Usaco
USACO Episodic + Semantic 48.21% $276.19 No