Gemini 2.0 Flash (February 2025)

Performance overview across all HAL benchmarks

9
Benchmarks
10
Agents
8
Pareto Optimal Benchmarks

Token Pricing

$0.1
Input Tokens
per 1M tokens
$0.4
Output Tokens
per 1M tokens

Benchmark Performance

On the Pareto Frontier? indicates whether this model achieved a Pareto-optimal trade-off between accuracy and cost on that benchmark. Models on the Pareto frontier represent the current state-of-the-art efficiency for their performance level.

Benchmark Agent Accuracy Cost On the Pareto Frontier?
Assistantbench
Browser-Use 2.62% $2.18 No
Corebench Hard
CORE-Agent 11.11% $12.46 No
Corebench Hard
HAL Generalist Agent 4.44% $7.06 No
Gaia
HAL Generalist Agent 32.73% $7.80 Yes
Gaia
HF Open Deep Research 19.39% $18.82 No
Online Mind2Web
Browser-Use 29.00% $8.83 Yes
Online Mind2Web
SeeAct 26.67% $5.03 Yes
Scicode
Scicode Zero Shot Agent 1.54% $0.12 Yes
Scicode
Scicode Tool Calling Agent 1.54% $5.23 No
Scienceagentbench
SAB Self-Debug 12.75% $0.19 Yes
Swebench Verified Mini
SWE-Agent 24.00% $4.72 Yes
Swebench Verified Mini
HAL Generalist Agent 2.00% $7.33 No
Taubench Airline
HAL Generalist Agent 22.00% $2.00 Yes
Usaco
USACO Episodic + Semantic 27.04% $1.46 Yes