Claude Code

Agent performance overview across all HAL benchmarks

1
Benchmarks
4
Models Used
1
Pareto Optimal Runs

Benchmark Performance

On the Pareto Frontier? indicates whether this agent achieved a Pareto-optimal trade-off between accuracy and cost on that benchmark. Agents on the Pareto frontier represent the current state-of-the-art efficiency for their performance level.

Benchmark Model Accuracy Cost On the Pareto Frontier?
Corebench Hard
Claude Opus 4.5 77.78% $87.16 Yes
Corebench Hard
Claude Sonnet 4.5 (September 2025) 62.22% $68.33 No
Corebench Hard
Claude Sonnet 4 (May 2025) 46.67% $65.58 No
Corebench Hard
Claude Opus 4.1 42.22% $331.79 No