Claude Opus 4.1 High (August 2025)
Performance overview across all HAL benchmarks
8
Benchmarks
9
Agents
0
Pareto Optimal Benchmarks
Token Pricing
$15
Input Tokens
per 1M tokens
$75
Output Tokens
per 1M tokens
Benchmark Performance
On the Pareto Frontier? indicates whether this model achieved a Pareto-optimal trade-off between accuracy and cost on that benchmark. Models on the Pareto frontier represent the current state-of-the-art efficiency for their performance level.
Benchmark | Agent | Accuracy | Cost | On the Pareto Frontier? |
---|---|---|---|---|
Assistantbench
|
Browser-Use | 13.75% | $779.72 | No |
Corebench Hard
|
CORE-Agent | 42.22% | $509.95 | No |
Corebench Hard
|
HAL Generalist Agent | 33.33% | $358.47 | No |
Gaia
|
HF Open Deep Research | 25.45% | $1473.64 | No |
Scicode
|
Scicode Tool Calling Agent | 6.92% | $550.54 | No |
Scienceagentbench
|
SAB Self-Debug | 26.47% | $33.75 | No |
Swebench Verified Mini
|
SWE-Agent | 54.00% | $1599.90 | No |
Swebench Verified Mini
|
HAL Generalist Agent | 46.00% | $399.93 | No |
Taubench Airline
|
TAU-bench Few Shot | 62.00% | $298.58 | No |
Taubench Airline
|
HAL Generalist Agent | 32.00% | $140.28 | No |
Usaco
|
USACO Episodic + Semantic | 51.47% | $267.72 | No |