o4-mini Low (April 2025)
Performance overview across all HAL benchmarks
9
Benchmarks
11
Agents
5
Pareto Benchmarks
Token Pricing
$1.1
Input Tokens
per 1M tokens
$4.4
Output Tokens
per 1M tokens
Benchmark Performance
On the Pareto Frontier? indicates whether this model achieved a Pareto-optimal trade-off between accuracy and cost on that benchmark. Models on the Pareto frontier represent the current state-of-the-art efficiency for their performance level.
Benchmark | Agent | Accuracy | Cost | On the Pareto Frontier? |
---|---|---|---|---|
Assistantbench
|
Browser-Use | 28.05% | $9.22 | Yes |
Corebench Hard
|
CORE-Agent | 17.78% | $31.79 | No |
Corebench Hard
|
HAL Generalist Agent | 15.56% | $22.50 | No |
Gaia
|
HAL Generalist Agent | 58.18% | $73.26 | Yes |
Gaia
|
HF Open Deep Research | 47.88% | $80.80 | No |
Online Mind2Web
|
SeeAct | 31.67% | $162.36 | No |
Online Mind2Web
|
Browser-Use | 18.33% | $201.44 | No |
Scicode
|
Scicode Zero Shot Agent | 9.23% | $1.74 | Yes |
Scicode
|
Scicode Tool Calling Agent | 4.62% | $46.30 | No |
Scienceagentbench
|
SAB Self-Debug | 27.45% | $3.95 | Yes |
Scienceagentbench
|
HAL Generalist Agent | 19.61% | $77.32 | No |
Swebench Verified Mini
|
SWE-Agent | 54.00% | $259.20 | Yes |
Swebench Verified Mini
|
HAL Generalist Agent | 6.00% | $87.03 | No |
Taubench Airline
|
TAU-bench Few Shot | 48.00% | $18.81 | No |
Taubench Airline
|
HAL Generalist Agent | 22.00% | $20.16 | No |
Usaco
|
USACO Episodic + Semantic | 30.94% | $21.14 | No |