Know What You Don’t Know: Getting Reliable Confidence Scores When Unsure of a Prediction

栏目: IT技术 · 发布时间: 4年前

内容简介:Softmax predicion scores are often used as a confidence score in a multi-class classification setting. In this post, we are going to show that softmax scores can be meaningless when doing regular empirical risk minimization by gradient descent. We are also

Softmax predicion scores are often used as a confidence score in a multi-class classification setting. In this post, we are going to show that softmax scores can be meaningless when doing regular empirical risk minimization by gradient descent. We are also going to apply the method presented in Deep Anomaly Detection with Outlier Exposure to mitigate this problem and add more meaning to the softmax score.

Discriminative classifiers (models that try to estimate P(y|x) from data) tend to be overconfident in their predictions, even if the input sample looks nothing like anything they have seen in the training phase. This makes it so that the output scores of such models cannot be reliably used as a confidence score since the model is often confident where it should not be.

Example :

In this synthetic example, we have one big cluster of class zero and another one for class one, plus two smaller groups of points of outliers that were not present in the training set.

Know What You Don’t Know: Getting Reliable Confidence Scores When Unsure of a Prediction

Toy example

If we apply a regular classifier to this we get something like this :

Know What You Don’t Know: Getting Reliable Confidence Scores When Unsure of a Prediction

Confidence scores baseline

We see that the classifier is overly confident everywhere, even the outlier samples are classified with a very high score. The confidence score is displayed using the heat-map .

This is what makes it so it is not a good idea to directly use the softmax scores as confidence scores, if a classifier is confident everywhere without having seen any evidence to support it in the training then it probably means that the confidence scores are wrong.

However, if we use the approach presented in Deep Anomaly Detection with Outlier Exposure we can achieve much more reasonable Softmax scores :

Know What You Don’t Know: Getting Reliable Confidence Scores When Unsure of a Prediction

Outlier exposure

This score map is much more reasonable and is useful to see where the model is rightly confident and where it is not. The outlier region has a very low confidence ~0.5 ( Equivalent to no confidence at all in a two-class setting).

Description of the Approach

The idea presented in Deep Anomaly Detection with Outlier Exposure is to use external data that is mostly different from your training/test data and force the model to predict the uniform distribution on this external data.

For example, if you are trying to build a classifier that predicts cat vs dog in images, you can get a bunch of bear and shark images and force the model to predict [0.5, 0.5] on those images.

Data And Model

We will use the 102 Flower as the in-distribution dataset and a subset of the OpenImage dataset as an out-of-distribution dataset. In the paper referenced in the introduction, they show that training on one set of out-of-distribution samples generalizes well to other sets that are out-of-distribution.

We use MobilenetV2 as our classification architecture and initialize the weights with Imagenet.

def get_model_classification(
input_shape=(None, None, 3),
weights="imagenet",
n_classes=102,
):
inputs = Input(input_shape)
base_model = MobileNetV2(
include_top=False, input_shape=input_shape, weights=weights
)
x = base_model(inputs)
x = Dropout(0.5)(x)
out1 = GlobalMaxPooling2D()(x)
out2 = GlobalAveragePooling2D()(x)
out = Concatenate(axis=-1)([out1, out2])
out = Dropout(0.5)(out)
out = Dense(n_classes, activation="softmax")(out)
model = Model(inputs, out)
model.compile(
optimizer=Adam(0.0001), loss=categorical_crossentropy, metrics=["acc"]
)
return model

We will use a generator to load the images from the hard drive batch by batch. In the baseline we only load the in-distribution images while in the Anomaly exposure model we load half the batch from in-distribution images with their correct label and the other half from out-of-distribution images with a uniform objective => :

target = [1 / n_label for _ in range(n_label)]

Results

Both training configurations get a little higher than 90% accuracy on in-distribution samples. We choose to predict “Don’t Know” if the softmax score is lower than 0.15 and thus abstain from making a class prediction.

Now let us see how each model behaves!

Regular training :

You can run the web application by doing :

streamlit run main.py

以上就是本文的全部内容,希望本文的内容对大家的学习或者工作能带来一定的帮助,也希望大家多多支持 码农网

查看所有标签

猜你喜欢:

本站部分资源来源于网络,本站转载出于传递更多信息之目的,版权归原作者或者来源机构所有,如转载稿涉及版权问题,请联系我们

Perl最佳实践

Perl最佳实践

康韦 / Taiwan公司 / 东南大学出版社 / 2008-3 / 78.00元

《Perl最佳实践》中所有的规则都是为了写出清晰、健壮、高效、可维护和简洁的程序而设计。Conway博士并不自诩这些规则是最广泛和最清晰的实践集,但实际上,《Perl最佳实践》确实提供了在实践中被广泛认可和应用的建议,而不是象牙塔似的编程理论。许多程序员凭直觉来编程,这些直觉来自于他们早期养成的习惯和风格。这样写出的程序似乎自然、直观,而且看起来也很不错。但是,如果你想严肃地对待程序员这份职业,那......一起来看看 《Perl最佳实践》 这本书的介绍吧!

在线进制转换器
在线进制转换器

各进制数互转换器

Markdown 在线编辑器
Markdown 在线编辑器

Markdown 在线编辑器