Agent:
GPT-4 Turbo
Reliability Dimensions by Benchmark
Accuracy
57.7%
Consistency
0.76
Predictability
0.58
Robustness
0.81
Safety
0.87
Reliability
0.72
Accuracy
35.6%
Consistency
0.72
Predictability
0.38
Robustness
0.98
Safety
0.85
Reliability
0.69