Browser-Use

Agent performance overview across all HAL benchmarks

2
Benchmarks
14
Models Used
5
Pareto Optimal Runs

Benchmark Performance

On the Pareto Frontier? indicates whether this agent achieved a Pareto-optimal trade-off between accuracy and cost on that benchmark. Agents on the Pareto frontier represent the current state-of-the-art efficiency for their performance level.

Benchmark Model Accuracy Cost On the Pareto Frontier?
Assistantbench
o3 Medium (April 2025) 38.81% $15.15 Yes
Assistantbench
GPT-5 Medium (August 2025) 35.23% $41.69 No
Assistantbench
o4-mini Low (April 2025) 28.05% $9.22 Yes
Assistantbench
o4-mini High (April 2025) 23.84% $16.39 No
Assistantbench
GPT-4.1 (April 2025) 17.39% $14.15 No
Assistantbench
Claude-3.7 Sonnet (February 2025) 16.69% $56.00 No
Assistantbench
Claude Opus 4.1 High (August 2025) 13.75% $779.72 No
Assistantbench
Claude-3.7 Sonnet High (February 2025) 13.08% $16.13 No
Assistantbench
DeepSeek R1 8.75% $6.62 No
Assistantbench
Claude Opus 4.1 (August 2025) 7.26% $385.43 No
Assistantbench
Gemini 2.0 Flash 2.62% $2.18 Yes
Assistantbench
DeepSeek V3 2.03% $2.18 No
Online Mind2Web
Claude Sonnet 4 (May 2025) 40.00% $1577.26 No
Online Mind2Web
Claude-3.7 Sonnet High (February 2025) 39.33% $1151.88 No
Online Mind2Web
Claude Sonnet 4 High (May 2025) 39.33% $1609.92 No
Online Mind2Web
Claude-3.7 Sonnet (February 2025) 38.33% $926.48 No
Online Mind2Web
GPT-4.1 (April 2025) 36.33% $236.62 No
Online Mind2Web
DeepSeek V3 32.33% $36.82 Yes
Online Mind2Web
GPT-5 Medium (August 2025) 32.00% $736.31 No
Online Mind2Web
Gemini 2.0 Flash 29.00% $8.83 Yes
Online Mind2Web
o3 Medium (April 2025) 29.00% $371.59 No
Online Mind2Web
DeepSeek R1 25.33% $21.26 No
Online Mind2Web
o4-mini High (April 2025) 20.00% $297.93 No
Online Mind2Web
o4-mini Low (April 2025) 18.33% $201.44 No