Agent:
GPT-4o Mini
Reliability Dimensions by Benchmark
Accuracy
32.1%
Consistency
0.76
Predictability
0.41
Robustness
0.91
Safety
0.81
Reliability
0.69
Accuracy
21.3%
Consistency
0.76
Predictability
0.32
Robustness
0.92
Safety
0.76
Reliability
0.67