Claude Sonnet 4 (May 2025)

Performance overview across all HAL benchmarks

Benchmarks

Agents

Pareto Optimal Benchmarks

Token Pricing

Input Tokens

per 1M tokens

$15

Output Tokens

per 1M tokens

Benchmark Performance

On the Pareto Frontier? indicates whether this model achieved a Pareto-optimal trade-off between accuracy and cost on that benchmark. Models on the Pareto frontier represent the current state-of-the-art efficiency for their performance level.

Benchmark	Agent	Accuracy	Cost	On the Pareto Frontier?
Corebench Hard	Claude Code	46.67%	$65.58	No
Corebench Hard	CORE-Agent	28.89%	$50.27	No
Online Mind2Web	Browser-Use	40.00%	$1577.26	No
Online Mind2Web	SeeAct	36.67%	$246.18	No