Browser-Use
Agent performance overview across all HAL benchmarks
2
Benchmarks
14
Models Used
5
Pareto Optimal Runs
Models Used
o3 Medium (April 2025)
GPT-5 Medium (August 2025)
o4-mini Low (April 2025)
o4-mini High (April 2025)
GPT-4.1 (April 2025)
Claude-3.7 Sonnet (February 2025)
Claude Opus 4.1 High (August 2025)
Claude-3.7 Sonnet High (February 2025)
DeepSeek R1
Claude Opus 4.1 (August 2025)
Gemini 2.0 Flash
DeepSeek V3
Claude Sonnet 4 (May 2025)
Claude Sonnet 4 High (May 2025)
Benchmark Performance
On the Pareto Frontier? indicates whether this agent achieved a Pareto-optimal trade-off between accuracy and cost on that benchmark. Agents on the Pareto frontier represent the current state-of-the-art efficiency for their performance level.
Benchmark | Model | Accuracy | Cost | On the Pareto Frontier? |
---|---|---|---|---|
Assistantbench
|
o3 Medium (April 2025) | 38.81% | $15.15 | Yes |
Assistantbench
|
GPT-5 Medium (August 2025) | 35.23% | $41.69 | No |
Assistantbench
|
o4-mini Low (April 2025) | 28.05% | $9.22 | Yes |
Assistantbench
|
o4-mini High (April 2025) | 23.84% | $16.39 | No |
Assistantbench
|
GPT-4.1 (April 2025) | 17.39% | $14.15 | No |
Assistantbench
|
Claude-3.7 Sonnet (February 2025) | 16.69% | $56.00 | No |
Assistantbench
|
Claude Opus 4.1 High (August 2025) | 13.75% | $779.72 | No |
Assistantbench
|
Claude-3.7 Sonnet High (February 2025) | 13.08% | $16.13 | No |
Assistantbench
|
DeepSeek R1 | 8.75% | $6.62 | No |
Assistantbench
|
Claude Opus 4.1 (August 2025) | 7.26% | $385.43 | No |
Assistantbench
|
Gemini 2.0 Flash | 2.62% | $2.18 | Yes |
Assistantbench
|
DeepSeek V3 | 2.03% | $2.18 | No |
Online Mind2Web
|
Claude Sonnet 4 (May 2025) | 40.00% | $1577.26 | No |
Online Mind2Web
|
Claude-3.7 Sonnet High (February 2025) | 39.33% | $1151.88 | No |
Online Mind2Web
|
Claude Sonnet 4 High (May 2025) | 39.33% | $1609.92 | No |
Online Mind2Web
|
Claude-3.7 Sonnet (February 2025) | 38.33% | $926.48 | No |
Online Mind2Web
|
GPT-4.1 (April 2025) | 36.33% | $236.62 | No |
Online Mind2Web
|
DeepSeek V3 | 32.33% | $36.82 | Yes |
Online Mind2Web
|
GPT-5 Medium (August 2025) | 32.00% | $736.31 | No |
Online Mind2Web
|
Gemini 2.0 Flash | 29.00% | $8.83 | Yes |
Online Mind2Web
|
o3 Medium (April 2025) | 29.00% | $371.59 | No |
Online Mind2Web
|
DeepSeek R1 | 25.33% | $21.26 | No |
Online Mind2Web
|
o4-mini High (April 2025) | 20.00% | $297.93 | No |
Online Mind2Web
|
o4-mini Low (April 2025) | 18.33% | $201.44 | No |