Agent: GPT-4 Turbo

Reliability Dimensions by Benchmark
Accuracy 30.8%
Consistency 0.76
Predictability 0.64
Robustness 0.87
Safety 1.00

Reliability 0.76
Accuracy 57.7%
Consistency 0.76
Predictability 0.58
Robustness 0.81
Safety 0.87

Reliability 0.72
Accuracy 35.6%
Consistency 0.72
Predictability 0.38
Robustness 0.98
Safety 0.85

Reliability 0.69