Applications of Certainty Scoring for Machine Learning Classification in
Multi-modal Contexts
Abstract
Quantitative characterizations and estimations of uncertainty are of
fundamental importance for machine learning classification, particularly
in safety-critical settings such as the military battlefield where
continuous real-time monitoring requires explainable and reliable
scoring. Reliance on the maximum a posteriori principle to determine
label classification can obscure a model’s certainty of label
assignment. We develop quantitative scores of certainty and competence
based on predicted probability estimates as an effective tool for
inferring the verity of positives across different data modalities and
architectures. Our theoretical results establish that competent models
have distinct distributions of certainty for true and false positives.
Our empirical results bear out that there are distinct distributions of
certainty scores on training and holdout data, as well as data that is a
priori out-of-distribution. Further, we find that the most reliable test
for out-of-distribution data is to compare the global True positive
certainty score distribution against test data. At least 92.3% of
out-of-distribution are successfully identified this way across our two
experimental modalities at the tranche level. Further, 100% of the
out-of-context images are identified as out-of-distribution using the
stochastic form of our out-of-distribution detection test across all
five stochastic variants of the ResNet models. Consequently, we find
that the use of our certainty framework provides a robust means of
detecting out-of-distribution inputs, while also serving as a reliable
mechanism for comparing model quality of accurately distinguishing
between true and False positives, particularly in safety-critical
contexts.