Agent: GPT-4 Turbo

Reliability Dimensions by Benchmark
Accuracy 20.0%
Consistency 0.70
Predictability 0.75
Robustness 0.81
Safety 1.00

Reliability 0.76
Accuracy 50.0%
Consistency 0.72
Predictability 0.51
Robustness 0.92
Safety 0.87

Reliability 0.71
Accuracy 35.6%
Consistency 0.72
Predictability 0.38
Robustness 0.98
Safety 0.85

Reliability 0.69