o4-mini Low (April 2025)

Performance overview across all HAL benchmarks

Benchmarks

Agents

Pareto Optimal Benchmarks

Token Pricing

$1.1

Input Tokens

per 1M tokens

$4.4

Output Tokens

per 1M tokens

Benchmark Performance

On the Pareto Frontier? indicates whether this model achieved a Pareto-optimal trade-off between accuracy and cost on that benchmark. Models on the Pareto frontier represent the current state-of-the-art efficiency for their performance level.

Benchmark	Agent	Accuracy	Cost	On the Pareto Frontier?
Assistantbench	Browser-Use	28.05%	$9.22	Yes
Corebench Hard	CORE-Agent	17.78%	$31.79	No
Corebench Hard	HAL Generalist Agent	15.56%	$22.50	No
Gaia	HAL Generalist Agent	58.18%	$73.26	Yes
Gaia	HF Open Deep Research	47.88%	$80.80	No
Online Mind2Web	SeeAct	31.67%	$162.36	No
Online Mind2Web	Browser-Use	18.33%	$201.44	No
Scicode	Scicode Zero Shot Agent	9.23%	$1.74	Yes
Scicode	HAL Generalist Agent	6.15%	$165.90	No
Scicode	Scicode Tool Calling Agent	4.62%	$46.30	No
Scienceagentbench	SAB Self-Debug	27.45%	$3.95	Yes
Scienceagentbench	HAL Generalist Agent	19.61%	$77.32	No
Swebench Verified Mini	SWE-Agent	54.00%	$259.20	No
Swebench Verified Mini	HAL Generalist Agent	6.00%	$87.03	No
Taubench Airline	TAU-bench Tool Calling	36.00%	$7.14	No
Taubench Airline	HAL Generalist Agent	22.00%	$20.16	No
Usaco	USACO Episodic + Semantic	30.94%	$21.14	No