Gemini 2.0 Flash (February 2025)

Performance overview across all HAL benchmarks

Benchmarks

Agents

Pareto Optimal Benchmarks

Token Pricing

$0.1

Input Tokens

per 1M tokens

$0.4

Output Tokens

per 1M tokens

Benchmark Performance

On the Pareto Frontier? indicates whether this model achieved a Pareto-optimal trade-off between accuracy and cost on that benchmark. Models on the Pareto frontier represent the current state-of-the-art efficiency for their performance level.

Benchmark	Agent	Accuracy	Cost	On the Pareto Frontier?
Assistantbench	Browser-Use	2.62%	$2.18	No
Corebench Hard	CORE-Agent	11.11%	$12.46	No
Corebench Hard	HAL Generalist Agent	4.44%	$7.06	No
Gaia	HAL Generalist Agent	32.73%	$7.80	Yes
Gaia	HF Open Deep Research	19.39%	$18.82	No
Online Mind2Web	Browser-Use	29.00%	$8.83	Yes
Online Mind2Web	SeeAct	26.67%	$5.03	Yes
Scicode	Scicode Zero Shot Agent	1.54%	$0.12	Yes
Scicode	Scicode Tool Calling Agent	1.54%	$5.23	No
Scicode	HAL Generalist Agent	0.00%	$61.49	No
Scienceagentbench	SAB Self-Debug	12.75%	$0.19	Yes
Swebench Verified Mini	SWE-Agent	24.00%	$4.72	Yes
Swebench Verified Mini	HAL Generalist Agent	2.00%	$7.33	No
Taubench Airline	HAL Generalist Agent	22.00%	$2.00	No
Usaco	USACO Episodic + Semantic	27.04%	$1.46	Yes