Agent:
GPT-5.2 (medium)
Reliability Dimensions by Benchmark
Accuracy
67.9%
Consistency
0.76
Predictability
0.74
Robustness
0.95
Safety
0.94
Reliability
0.82