Gemini 2.0 Flash (February 2025)
Performance overview across all HAL benchmarks
9
Benchmarks
10
Agents
8
Pareto Optimal Benchmarks
Token Pricing
$0.1
Input Tokens
per 1M tokens
$0.4
Output Tokens
per 1M tokens
Benchmark Performance
On the Pareto Frontier? indicates whether this model achieved a Pareto-optimal trade-off between accuracy and cost on that benchmark. Models on the Pareto frontier represent the current state-of-the-art efficiency for their performance level.
Benchmark | Agent | Accuracy | Cost | On the Pareto Frontier? |
---|---|---|---|---|
Assistantbench
|
Browser-Use | 2.62% | $2.18 | No |
Corebench Hard
|
CORE-Agent | 11.11% | $12.46 | No |
Corebench Hard
|
HAL Generalist Agent | 4.44% | $7.06 | No |
Gaia
|
HAL Generalist Agent | 32.73% | $7.80 | Yes |
Gaia
|
HF Open Deep Research | 19.39% | $18.82 | No |
Online Mind2Web
|
Browser-Use | 29.00% | $8.83 | Yes |
Online Mind2Web
|
SeeAct | 26.67% | $5.03 | Yes |
Scicode
|
Scicode Zero Shot Agent | 1.54% | $0.12 | Yes |
Scicode
|
Scicode Tool Calling Agent | 1.54% | $5.23 | No |
Scienceagentbench
|
SAB Self-Debug | 12.75% | $0.19 | Yes |
Swebench Verified Mini
|
SWE-Agent | 24.00% | $4.72 | Yes |
Swebench Verified Mini
|
HAL Generalist Agent | 2.00% | $7.33 | No |
Taubench Airline
|
HAL Generalist Agent | 22.00% | $2.00 | Yes |
Usaco
|
USACO Episodic + Semantic | 27.04% | $1.46 | Yes |