The 3 Most Important Basic Classification Metrics

栏目: IT技术 · 发布时间: 4年前

内容简介：I call the three metrics we’ll exploreWe’re focussing on binary classification cases — when there are two possible outcomes.Let’s get to it!:rocket:

I call the three metrics we’ll explore basic metrics because each one consists of a single quadrant of the confusion matrix divided by that quadrant PLUS one other quadrant.

We’re focussing on binary classification cases — when there are two possible outcomes.

Let’s get to it!:rocket:

Recall (aka Sensitivity, True Positive Rate, Probability of Detection, Hit Rate, & more!)

The most common basic metric is often called recall or sensitivity. Its more descriptive name is the t rue positive rate (TPR). I’ll refer to it as recall .

Recall is important to know when you really want to correctly predict the cases in the true class. For example, if you have a test for a dangerous form of cancer, you really want that test to do a good job detecting all of the cases where someone actually has the cancer. So you really care about recall.

The recall is calculated by dividing the true positives by the true positives PLUS the false negatives:

Recall = TP / (TP + FN)

In other words, out of all the actual true cases, what percentage did your model predict correctly?

Here are the results from our model’s predictions of whether a website visitor would purchase a shirt at Jeff’s Awesome Hawaiian Shirt store. :hibiscus::shirt:

                 Predicted Positive    Predicted Negative
Actual Positive          80  (TP)            20 (FN)
Actual Negative          50  (FP)            50 (TN)

Using our example confusion matrix, what is the recall?

80/(80 + 20) = 80%

The model correctly predicted four out of five sales. That sounds pretty good! :grinning: We could compare our model’s recall to another model’s recall to help us choose which model we want to use for our predictions.

The best possible recall is 1 and the worst possible is 0. The scikit-learn function name is recall_score .

For cases where recall is really important, there’s something else we can do to correctly predict more of the true cases: we can change our decision threshold .

The 3 Most Important Basic Classification Metrics — Shift perspective of Three Zinnen: Source: pixabay.com

Decision Threshold

By default, the decision threshold for a scikit-learn classification model is set to .5. This means that if the model thinks there is a 50% or greater chance of an observation being a member of the positive class, then that observation is predicted to be a member of the positive class.

If we care a lot about the recall, we could lower our decision threshold to try to catch more of the actual positive cases. For example, maybe you want the model to predict true for every observation with a probability of 30% or higher.

In scikit-learn you could set the threshold to .3 like this:

recall_score(y_test, y_predictions, threshold=.3)

This change would likely turn some false negatives into true positives. Yeah! :tada: However, the model would turn some true negatives into false positives, too. Boo! :cry:

After all, you could get a perfect 100% recall by predicting that every observation was positive. But that’s not usually a good plan.

When the cost of false positives is high you want to pay attention to them. You need a metric that will capture how well your model discriminates between true positives and false positives. You need to pay attention to precision .

Precision

Precision is the ratio of how many of the positive predictions were correct relative to all the positive predictions. It answers the question

What percentage of the positive predictions were correct?

Precision = TP / TP + FP

I remember precision by focussing on the alliteration with the letter p.

P recision is all the True P ositives divided by all the P redicted P ositives.

Here’s the Hawaiian Shirt sale confusion matrix again:

                    Predicted Positive    Predicted Negative
Actual Positive            80 (TP)             20 (FN)
Actual Negative            50 (FP)             50 (TN)

What’s the precision score?

80/(80+50) = 61.5%

The scikit-learn metric is precision_score . The syntax is similar to recall’s.

precision_score(y_test, predictions)

Again, the best value is 1 (100%) and the worst value is 0.

Precision is often discussed in terms of its relationship to recall. In fact, there’s a plot_precision_recall_curve function in scikit-learn that we can use to visualize the tradeoff between precision and recall.

Here’s the result of plotting the precision recall curve on a logistic regression model with some of the Titanic dataset:

AP stands for average precision, this is the area under the precision-recall curve. Higher is better and the max possible is 1.

The code to make the plot is:

plot_precision_recall_curve(lr, X_test, y_test);

The plot shows what the precision and recall would be at different decision thresholds. Notice that the recall goes up as the precision goes down. :chart_with_downwards_trend:

If we set our decision threshold lower, we’ll move to the right along the curve. More observations will be classified as the positive class, and we will hopefully catch more of the true positive cases. Recall will go up. :grinning:

However, we will have more false positives, too. This will make the denominator for precision larger. The result will be lower precision. ☹️

How many false positives we are willing to tolerate depends on how large the cost of a false positive is relative to the cost of a true positive. It’s a balancing act!

Sometimes we care about how well our model is predicting the actual negatives. Let’s look at a metric for that situation.

Specificity (True Negative Rate)

Specificity also goes by the name true negative rate (TNR). It answers the question:

How well did my model catch the negative cases?

Here’s the formula for specificity:

Specificity = TN / (TN + FP)

Notice that specificity is only concerned with the actual negative cases.

Here’s the Hawaiian shirt sale confusion matrix again.

                   Predicted Positive    Predicted Negative
Actual Positive            80 (TP)             20 (FN)
Actual Negative            50 (FP)             50 (TN)

What’s our model’s specificity?

50 / (50 + 50) = 50%

Specificity can range from 0 to 1, so 50% is not super great. The model didn’t do a good job correctly predicting when someone would NOT make a purchase.

Specificity is a nice metric when it’s important to correctly predicting the actual negatives. For example, if the treatment for a disease is dangerous, you want a high specificity. :+1:

Specificity is commonly discussed in tandem with sensitivity. Remember that sensitivity goes by the names recall and true positive rate , too.

Scikit-learn does not have a built-in function to compute the specificity. You can create the four outcome variables from the confusion matrix and compute the specificity like this:

import numpy as np
from sklearn.metrics import confusion_matrixtn, fp, fn, tp = confusion_matrix(y_test, predictions).ravel()
tn / (tn + fp)

Specificity is the final basic classification metric you need in your tool belt. :wrench:

Recap

You’ve learned about recall, precision, and specificity. Remember that recall is also referred to as sensitivity or the true positive rate. Specificity is also called the true negative rate.

With accuracy, recall, precision, and specificity under your belt, you’ll have the basic classification terms you need to know!

I hope you found this introduction to basic classification metrics to be helpful. If you did, please share it on your favorite social media so other folks can find it, too. :grinning:

In the final article in this series we’ll explore the three most important composite metrics. They are a bit more complicated, but they convey a lot of information in a single number. :rocket:

I write about Python , SQL , Docker , and other tech topics. If any of that’s of interest to you, sign up for my mailing list of data science resources and read more here . :+1:

以上就是本文的全部内容，希望本文的内容对大家的学习或者工作能带来一定的帮助，也希望大家多多支持码农网

查看所有标签

猜你喜欢:

The 3 Most Important Basic Classification Metrics

本站部分资源来源于网络，本站转载出于传递更多信息之目的，版权归原作者或者来源机构所有，如转载稿涉及版权问题，请联系我们。

码农书籍

国际游戏设计全教程

[美]迈克尔·萨蒙德 / 张然、赵嫣 / 中国青年出版社 / 2017-2 / 108.00元

你想成为一名电子游戏设计师吗？想知道《肯塔基0号路》《到家》《枪口》等独立游戏的制作理念及过程吗？想了解《戈莫布偶大冒险》《辐射3》《战争机器》中关卡设计的奥秘吗？本书用通俗易懂的文字介绍了在游戏开发与策划过程中，需要掌握的游戏设计原理和制作的基础知识，可以作为读者从“构思一个电子游戏”到“真正完成一个电子游戏”的完备指南。本书以系统的游戏设计流程结合大量优秀的游戏设计案例进行讲解，让读者......一起来看看《国际游戏设计全教程》这本书的介绍吧!

码农工具

RGB转16进制工具

RGB HEX 互转工具

RGB HSV 转换

RGB HSV 互转工具