The 3 Most Important Basic Classification Metrics

栏目: IT技术 · 发布时间: 4年前

内容简介:I call the three metrics we’ll exploreWe’re focussing on binary classification cases — when there are two possible outcomes.Let’s get to it!:rocket:

I call the three metrics we’ll explore basic metrics because each one consists of a single quadrant of the confusion matrix divided by that quadrant PLUS one other quadrant.

We’re focussing on binary classification cases — when there are two possible outcomes.

Let’s get to it!:rocket:

Recall (aka Sensitivity, True Positive Rate, Probability of Detection, Hit Rate, & more!)

The most common basic metric is often called recall or sensitivity. Its more descriptive name is the t rue positive rate (TPR). I’ll refer to it as recall .

Recall is important to know when you really want to correctly predict the cases in the true class. For example, if you have a test for a dangerous form of cancer, you really want that test to do a good job detecting all of the cases where someone actually has the cancer. So you really care about recall.

The recall is calculated by dividing the true positives by the true positives PLUS the false negatives:

Recall = TP / (TP + FN)

In other words, out of all the actual true cases, what percentage did your model predict correctly?

Here are the results from our model’s predictions of whether a website visitor would purchase a shirt at Jeff’s Awesome Hawaiian Shirt store. :hibiscus::shirt:

                 Predicted Positive    Predicted Negative
Actual Positive          80  (TP)            20 (FN)
Actual Negative          50  (FP)            50 (TN)

Using our example confusion matrix, what is the recall?

80/(80 + 20) = 80%

The model correctly predicted four out of five sales. That sounds pretty good! :grinning: We could compare our model’s recall to another model’s recall to help us choose which model we want to use for our predictions.

The best possible recall is 1 and the worst possible is 0. The scikit-learn function name is recall_score .

For cases where recall is really important, there’s something else we can do to correctly predict more of the true cases: we can change our decision threshold .

The 3 Most Important Basic Classification Metrics

Shift perspective of Three Zinnen: Source: pixabay.com

Decision Threshold

By default, the decision threshold for a scikit-learn classification model is set to .5. This means that if the model thinks there is a 50% or greater chance of an observation being a member of the positive class, then that observation is predicted to be a member of the positive class.

If we care a lot about the recall, we could lower our decision threshold to try to catch more of the actual positive cases. For example, maybe you want the model to predict true for every observation with a probability of 30% or higher.

In scikit-learn you could set the threshold to .3 like this:

recall_score(y_test, y_predictions, threshold=.3)

This change would likely turn some false negatives into true positives. Yeah! :tada: However, the model would turn some true negatives into false positives, too. Boo! :cry:

After all, you could get a perfect 100% recall by predicting that every observation was positive. But that’s not usually a good plan.

When the cost of false positives is high you want to pay attention to them. You need a metric that will capture how well your model discriminates between true positives and false positives. You need to pay attention to precision .

The 3 Most Important Basic Classification Metrics

Precisely. Source: pixabay.com

Precision

Precision is the ratio of how many of the positive predictions were correct relative to all the positive predictions. It answers the question

What percentage of the positive predictions were correct?

Precision = TP / TP + FP

I remember precision by focussing on the alliteration with the letter p.

P recision is all the True P ositives divided by all the P redicted P ositives.

Here’s the Hawaiian Shirt sale confusion matrix again:

                    Predicted Positive    Predicted Negative
Actual Positive            80 (TP)             20 (FN)
Actual Negative            50 (FP)             50 (TN)

What’s the precision score?

80/(80+50) = 61.5%

The scikit-learn metric is precision_score . The syntax is similar to recall’s.

precision_score(y_test, predictions)

Again, the best value is 1 (100%) and the worst value is 0.

Precision is often discussed in terms of its relationship to recall. In fact, there’s a plot_precision_recall_curve function in scikit-learn that we can use to visualize the tradeoff between precision and recall.

Here’s the result of plotting the precision recall curve on a logistic regression model with some of the Titanic dataset:

The 3 Most Important Basic Classification Metrics

AP stands for average precision, this is the area under the precision-recall curve. Higher is better and the max possible is 1.

The code to make the plot is:

plot_precision_recall_curve(lr, X_test, y_test);

The plot shows what the precision and recall would be at different decision thresholds. Notice that the recall goes up as the precision goes down. :chart_with_downwards_trend:

If we set our decision threshold lower, we’ll move to the right along the curve. More observations will be classified as the positive class, and we will hopefully catch more of the true positive cases. Recall will go up. :grinning:

However, we will have more false positives, too. This will make the denominator for precision larger. The result will be lower precision. ☹️

How many false positives we are willing to tolerate depends on how large the cost of a false positive is relative to the cost of a true positive. It’s a balancing act!

Sometimes we care about how well our model is predicting the actual negatives. Let’s look at a metric for that situation.

The 3 Most Important Basic Classification Metrics

It’s a negative if you need a helicopter rescue. Source: pixabay.com

Specificity (True Negative Rate)

Specificity also goes by the name true negative rate (TNR). It answers the question:

How well did my model catch the negative cases?

Here’s the formula for specificity:

Specificity = TN / (TN + FP)

Notice that specificity is only concerned with the actual negative cases.

Here’s the Hawaiian shirt sale confusion matrix again.

                   Predicted Positive    Predicted Negative
Actual Positive            80 (TP)             20 (FN)
Actual Negative            50 (FP)             50 (TN)

What’s our model’s specificity?

50 / (50 + 50) = 50%

Specificity can range from 0 to 1, so 50% is not super great. The model didn’t do a good job correctly predicting when someone would NOT make a purchase.

Specificity is a nice metric when it’s important to correctly predicting the actual negatives. For example, if the treatment for a disease is dangerous, you want a high specificity. :+1:

Specificity is commonly discussed in tandem with sensitivity. Remember that sensitivity goes by the names recall and true positive rate , too.

Scikit-learn does not have a built-in function to compute the specificity. You can create the four outcome variables from the confusion matrix and compute the specificity like this:

import numpy as np
from sklearn.metrics import confusion_matrixtn, fp, fn, tp = confusion_matrix(y_test, predictions).ravel()
tn / (tn + fp)

Specificity is the final basic classification metric you need in your tool belt. :wrench:

Recap

You’ve learned about recall, precision, and specificity. Remember that recall is also referred to as sensitivity or the true positive rate. Specificity is also called the true negative rate.

With accuracy, recall, precision, and specificity under your belt, you’ll have the basic classification terms you need to know!

I hope you found this introduction to basic classification metrics to be helpful. If you did, please share it on your favorite social media so other folks can find it, too. :grinning:

In the final article in this series we’ll explore the three most important composite metrics. They are a bit more complicated, but they convey a lot of information in a single number. :rocket:

I write about Python , SQL , Docker , and other tech topics. If any of that’s of interest to you, sign up for my mailing list of data science resources and read more here . :+1:

The 3 Most Important Basic Classification Metrics

以上就是本文的全部内容,希望本文的内容对大家的学习或者工作能带来一定的帮助,也希望大家多多支持 码农网

查看所有标签

猜你喜欢:

本站部分资源来源于网络,本站转载出于传递更多信息之目的,版权归原作者或者来源机构所有,如转载稿涉及版权问题,请联系我们

Python编程实践

Python编程实践

Jennifer Campbell、Paul Gries、Jason Montojo、Greg Wilson / 唐学韬 / 机械工业出版社华章公司 / 2011-12-31 / 49.00元

Python是当今世界流行的编程语言之一。本书共15章,通过一些短小精悍的交互式Python脚本帮助学生进行练习,并在这个过程中掌握诸如数据结构、排序和搜索算法、面向对象编程、数据库访问、图形用户界面等基本概念以及良好的程序设计风格。本书既是一本注重科学的计算机科学专业教材,也是一本目标明确的Python参考书。 本书语言风格言简意赅,图表丰富,简单实用,是一本优秀的Python入门级读物,......一起来看看 《Python编程实践》 这本书的介绍吧!

JSON 在线解析
JSON 在线解析

在线 JSON 格式化工具

XML、JSON 在线转换
XML、JSON 在线转换

在线XML、JSON转换工具

HEX CMYK 转换工具
HEX CMYK 转换工具

HEX CMYK 互转工具