USACO Episodic + Semantic

Agent performance overview across all HAL benchmarks

1
Benchmarks
12
Models Used
3
Pareto Optimal Runs

Benchmark Performance

On the Pareto Frontier? indicates whether this agent achieved a Pareto-optimal trade-off between accuracy and cost on that benchmark. Agents on the Pareto frontier represent the current state-of-the-art efficiency for their performance level.

Benchmark Model Accuracy Cost On the Pareto Frontier?
Usaco
GPT-5 Medium (August 2025) 69.71% $64.13 Yes
Usaco
o4-mini High (April 2025) 57.98% $44.04 No
Usaco
Claude Opus 4.1 High (August 2025) 51.47% $267.72 No
Usaco
Claude Opus 4.1 (August 2025) 48.21% $276.19 No
Usaco
o3 Medium (April 2025) 46.25% $57.30 No
Usaco
GPT-4.1 (April 2025) 44.95% $28.10 No
Usaco
DeepSeek V3 39.09% $2.78 Yes
Usaco
DeepSeek R1 38.11% $8.18 No
Usaco
o4-mini Low (April 2025) 30.94% $21.14 No
Usaco
Claude-3.7 Sonnet (February 2025) 29.32% $38.70 No
Usaco
Gemini 2.0 Flash 27.04% $1.46 Yes
Usaco
Claude-3.7 Sonnet High (February 2025) 26.71% $56.43 No