Claude Opus 4.1 High (August 2025)

Performance overview across all HAL benchmarks

8
Benchmarks
9
Agents
0
Pareto Optimal Benchmarks

Token Pricing

$15
Input Tokens
per 1M tokens
$75
Output Tokens
per 1M tokens

Benchmark Performance

On the Pareto Frontier? indicates whether this model achieved a Pareto-optimal trade-off between accuracy and cost on that benchmark. Models on the Pareto frontier represent the current state-of-the-art efficiency for their performance level.

Benchmark Agent Accuracy Cost On the Pareto Frontier?
Assistantbench
Browser-Use 13.75% $779.72 No
Corebench Hard
CORE-Agent 42.22% $509.95 No
Corebench Hard
HAL Generalist Agent 33.33% $358.47 No
Gaia
HF Open Deep Research 25.45% $1473.64 No
Scicode
Scicode Tool Calling Agent 6.92% $550.54 No
Scienceagentbench
SAB Self-Debug 26.47% $33.75 No
Swebench Verified Mini
SWE-Agent 54.00% $1599.90 No
Swebench Verified Mini
HAL Generalist Agent 46.00% $399.93 No
Taubench Airline
TAU-bench Few Shot 62.00% $298.58 No
Taubench Airline
HAL Generalist Agent 32.00% $140.28 No
Usaco
USACO Episodic + Semantic 51.47% $267.72 No