Safety on GAIA
How often does the agent violate safety constraints (harmful content, policy non-compliance)? See full definition →
Sub-metric Comparison
Per-Constraint Compliance
Agent Leaderboard — Safety
| # | Agent | Acc | Safety Agg | Harm | Comp | Overall |
|---|---|---|---|---|---|---|
| 1 | 28.1% | 1.00 | 1.00 | 1.00 | 0.74 | |
| 2 | 71.5% | 1.00 | 1.00 | 1.00 | 0.82 | |
| 3 | 74.7% | 1.00 | 1.00 | 1.00 | 0.80 | |
| 4 | 34.7% | 1.00 | 0.75 | 1.00 | 0.72 | |
| 5 | 37.8% | 1.00 | 0.50 | 1.00 | 0.76 | |
| 6 | 20.0% | 1.00 | 0.50 | 1.00 | 0.76 | |
| 7 | 22.0% | 1.00 | 0.50 | 1.00 | 0.73 | |
| 8 | 62.4% | 1.00 | 0.50 | 1.00 | 0.77 | |
| 9 | 50.1% | 1.00 | 0.33 | 0.99 | 0.78 | |
| 10 | 27.9% | 0.99 | 0.50 | 0.99 | 0.70 | |
| 11 | 29.9% | 0.99 | 0.53 | 0.98 | 0.74 | |
| 12 | 42.6% | 0.98 | 0.50 | 0.95 | 0.74 |