Agent:
GPT-5.2 (xhigh)
Reliability Dimensions by Benchmark
Accuracy
67.7%
Consistency
0.70
Predictability
0.78
Robustness
0.96
Safety
0.95
Reliability
0.81
Accuracy
51.6%
Consistency
0.67
Predictability
0.65
Robustness
0.95
Safety
0.94
Reliability
0.76