Know What You Don’t Know: Getting Reliable Confidence Scores When Unsure of a Prediction

栏目: IT技术 · 发布时间: 4年前

内容简介：Softmax predicion scores are often used as a confidence score in a multi-class classification setting. In this post, we are going to show that softmax scores can be meaningless when doing regular empirical risk minimization by gradient descent. We are also

Softmax predicion scores are often used as a confidence score in a multi-class classification setting. In this post, we are going to show that softmax scores can be meaningless when doing regular empirical risk minimization by gradient descent. We are also going to apply the method presented in Deep Anomaly Detection with Outlier Exposure to mitigate this problem and add more meaning to the softmax score.

Discriminative classifiers (models that try to estimate P(y|x) from data) tend to be overconfident in their predictions, even if the input sample looks nothing like anything they have seen in the training phase. This makes it so that the output scores of such models cannot be reliably used as a confidence score since the model is often confident where it should not be.

Example :

In this synthetic example, we have one big cluster of class zero and another one for class one, plus two smaller groups of points of outliers that were not present in the training set.

Know What You Don’t Know: Getting Reliable Confidence Scores When Unsure of a Prediction — Toy example

If we apply a regular classifier to this we get something like this :

We see that the classifier is overly confident everywhere, even the outlier samples are classified with a very high score. The confidence score is displayed using the heat-map .

This is what makes it so it is not a good idea to directly use the softmax scores as confidence scores, if a classifier is confident everywhere without having seen any evidence to support it in the training then it probably means that the confidence scores are wrong.

However, if we use the approach presented in Deep Anomaly Detection with Outlier Exposure we can achieve much more reasonable Softmax scores :

This score map is much more reasonable and is useful to see where the model is rightly confident and where it is not. The outlier region has a very low confidence ~0.5 ( Equivalent to no confidence at all in a two-class setting).

Description of the Approach

The idea presented in Deep Anomaly Detection with Outlier Exposure is to use external data that is mostly different from your training/test data and force the model to predict the uniform distribution on this external data.

For example, if you are trying to build a classifier that predicts cat vs dog in images, you can get a bunch of bear and shark images and force the model to predict [0.5, 0.5] on those images.

Data And Model

We will use the 102 Flower as the in-distribution dataset and a subset of the OpenImage dataset as an out-of-distribution dataset. In the paper referenced in the introduction, they show that training on one set of out-of-distribution samples generalizes well to other sets that are out-of-distribution.

We use MobilenetV2 as our classification architecture and initialize the weights with Imagenet.

def get_model_classification(
    input_shape=(None, None, 3),
    weights="imagenet",
    n_classes=102,
):
    inputs = Input(input_shape)
    base_model = MobileNetV2(
        include_top=False, input_shape=input_shape, weights=weights
    )    x = base_model(inputs)
    x = Dropout(0.5)(x)
    out1 = GlobalMaxPooling2D()(x)
    out2 = GlobalAveragePooling2D()(x)
    out = Concatenate(axis=-1)([out1, out2])
    out = Dropout(0.5)(out)
    out = Dense(n_classes, activation="softmax")(out)
    model = Model(inputs, out)
    model.compile(
        optimizer=Adam(0.0001), loss=categorical_crossentropy, metrics=["acc"]
    )    return model

We will use a generator to load the images from the hard drive batch by batch. In the baseline we only load the in-distribution images while in the Anomaly exposure model we load half the batch from in-distribution images with their correct label and the other half from out-of-distribution images with a uniform objective => :

target = [1 / n_label for _ in range(n_label)]

Results

Both training configurations get a little higher than 90% accuracy on in-distribution samples. We choose to predict “Don’t Know” if the softmax score is lower than 0.15 and thus abstain from making a class prediction.

Now let us see how each model behaves!

Regular training :

You can run the web application by doing :

streamlit run main.py

以上就是本文的全部内容，希望本文的内容对大家的学习或者工作能带来一定的帮助，也希望大家多多支持码农网

查看所有标签

猜你喜欢:

Know What You Don’t Know: Getting Reliable Confidence Scores When Unsure of a Prediction

本站部分资源来源于网络，本站转载出于传递更多信息之目的，版权归原作者或者来源机构所有，如转载稿涉及版权问题，请联系我们。

码农书籍

Writing Apache Modules with Perl and C

Lincoln Stein、Doug MacEachern / O'Reilly Media, Inc. / 1999-03 / USD 39.95

Apache is the most popular Web server on the Internet because it is free, reliable, and extensible. The availability of the source code and the modular design of Apache makes it possible to extend Web......一起来看看《Writing Apache Modules with Perl and C》这本书的介绍吧!

码农工具