Agent:
GPT-4 Turbo
Reliability Dimensions by Benchmark
Accuracy
50.0%
Consistency
0.72
Predictability
0.51
Robustness
0.92
Safety
0.87
Reliability
0.71
Accuracy
35.6%
Consistency
0.72
Predictability
0.38
Robustness
0.98
Safety
0.85
Reliability
0.69