Agent:
GPT-4o Mini
Reliability Dimensions by Benchmark
Accuracy
29.5%
Consistency
0.74
Predictability
0.35
Robustness
0.93
Safety
0.85
Reliability
0.67
Accuracy
21.3%
Consistency
0.76
Predictability
0.32
Robustness
0.92
Safety
0.76
Reliability
0.67