Agent: GPT-5.2 (medium)

Reliability Dimensions by Benchmark
Accuracy 31.8%
Consistency 0.62
Predictability 0.61
Robustness 0.91
Safety 0.99

Reliability 0.72
Accuracy 67.9%
Consistency 0.76
Predictability 0.74
Robustness 0.95
Safety 0.94

Reliability 0.82