内容简介:I call the three metrics we’ll exploreWe’re focussing on binary classification cases — when there are two possible outcomes.Let’s get to it!:rocket:
I call the three metrics we’ll explore basic metrics because each one consists of a single quadrant of the confusion matrix divided by that quadrant PLUS one other quadrant.
We’re focussing on binary classification cases — when there are two possible outcomes.
Let’s get to it!:rocket:
Recall (aka Sensitivity, True Positive Rate, Probability of Detection, Hit Rate, & more!)
The most common basic metric is often called recall or sensitivity. Its more descriptive name is the t rue positive rate (TPR). I’ll refer to it as recall .
Recall is important to know when you really want to correctly predict the cases in the true class. For example, if you have a test for a dangerous form of cancer, you really want that test to do a good job detecting all of the cases where someone actually has the cancer. So you really care about recall.
The recall is calculated by dividing the true positives by the true positives PLUS the false negatives:
Recall = TP / (TP + FN)
In other words, out of all the actual true cases, what percentage did your model predict correctly?
Here are the results from our model’s predictions of whether a website visitor would purchase a shirt at Jeff’s Awesome Hawaiian Shirt store. :hibiscus::shirt:
Predicted Positive Predicted Negative Actual Positive 80 (TP) 20 (FN) Actual Negative 50 (FP) 50 (TN)
Using our example confusion matrix, what is the recall?
80/(80 + 20) = 80%
The model correctly predicted four out of five sales. That sounds pretty good! :grinning: We could compare our model’s recall to another model’s recall to help us choose which model we want to use for our predictions.
The best possible recall is 1 and the worst possible is 0. The scikit-learn function name is recall_score .
For cases where recall is really important, there’s something else we can do to correctly predict more of the true cases: we can change our decision threshold .
Decision Threshold
By default, the decision threshold for a scikit-learn classification model is set to .5. This means that if the model thinks there is a 50% or greater chance of an observation being a member of the positive class, then that observation is predicted to be a member of the positive class.
If we care a lot about the recall, we could lower our decision threshold to try to catch more of the actual positive cases. For example, maybe you want the model to predict true for every observation with a probability of 30% or higher.
In scikit-learn you could set the threshold to .3 like this:
recall_score(y_test, y_predictions, threshold=.3)
This change would likely turn some false negatives into true positives. Yeah! :tada: However, the model would turn some true negatives into false positives, too. Boo! :cry:
After all, you could get a perfect 100% recall by predicting that every observation was positive. But that’s not usually a good plan.
When the cost of false positives is high you want to pay attention to them. You need a metric that will capture how well your model discriminates between true positives and false positives. You need to pay attention to precision .
Precision
Precision is the ratio of how many of the positive predictions were correct relative to all the positive predictions. It answers the question
What percentage of the positive predictions were correct?
Precision = TP / TP + FP
I remember precision by focussing on the alliteration with the letter p.
P recision is all the True P ositives divided by all the P redicted P ositives.
Here’s the Hawaiian Shirt sale confusion matrix again:
Predicted Positive Predicted Negative Actual Positive 80 (TP) 20 (FN) Actual Negative 50 (FP) 50 (TN)
What’s the precision score?
80/(80+50) = 61.5%
The scikit-learn metric is precision_score . The syntax is similar to recall’s.
precision_score(y_test, predictions)
Again, the best value is 1 (100%) and the worst value is 0.
Precision is often discussed in terms of its relationship to recall. In fact, there’s a plot_precision_recall_curve function in scikit-learn that we can use to visualize the tradeoff between precision and recall.
Here’s the result of plotting the precision recall curve on a logistic regression model with some of the Titanic dataset:
AP stands for average precision, this is the area under the precision-recall curve. Higher is better and the max possible is 1.
The code to make the plot is:
plot_precision_recall_curve(lr, X_test, y_test);
The plot shows what the precision and recall would be at different decision thresholds. Notice that the recall goes up as the precision goes down. :chart_with_downwards_trend:
If we set our decision threshold lower, we’ll move to the right along the curve. More observations will be classified as the positive class, and we will hopefully catch more of the true positive cases. Recall will go up. :grinning:
However, we will have more false positives, too. This will make the denominator for precision larger. The result will be lower precision. ☹️
How many false positives we are willing to tolerate depends on how large the cost of a false positive is relative to the cost of a true positive. It’s a balancing act!
Sometimes we care about how well our model is predicting the actual negatives. Let’s look at a metric for that situation.
Specificity (True Negative Rate)
Specificity also goes by the name true negative rate (TNR). It answers the question:
How well did my model catch the negative cases?
Here’s the formula for specificity:
Specificity = TN / (TN + FP)
Notice that specificity is only concerned with the actual negative cases.
Here’s the Hawaiian shirt sale confusion matrix again.
Predicted Positive Predicted Negative Actual Positive 80 (TP) 20 (FN) Actual Negative 50 (FP) 50 (TN)
What’s our model’s specificity?
50 / (50 + 50) = 50%
Specificity can range from 0 to 1, so 50% is not super great. The model didn’t do a good job correctly predicting when someone would NOT make a purchase.
Specificity is a nice metric when it’s important to correctly predicting the actual negatives. For example, if the treatment for a disease is dangerous, you want a high specificity. :+1:
Specificity is commonly discussed in tandem with sensitivity. Remember that sensitivity goes by the names recall and true positive rate , too.
Scikit-learn does not have a built-in function to compute the specificity. You can create the four outcome variables from the confusion matrix and compute the specificity like this:
import numpy as np from sklearn.metrics import confusion_matrixtn, fp, fn, tp = confusion_matrix(y_test, predictions).ravel() tn / (tn + fp)
Specificity is the final basic classification metric you need in your tool belt. :wrench:
Recap
You’ve learned about recall, precision, and specificity. Remember that recall is also referred to as sensitivity or the true positive rate. Specificity is also called the true negative rate.
With accuracy, recall, precision, and specificity under your belt, you’ll have the basic classification terms you need to know!
I hope you found this introduction to basic classification metrics to be helpful. If you did, please share it on your favorite social media so other folks can find it, too. :grinning:
In the final article in this series we’ll explore the three most important composite metrics. They are a bit more complicated, but they convey a lot of information in a single number. :rocket:
I write about Python , SQL , Docker , and other tech topics. If any of that’s of interest to you, sign up for my mailing list of data science resources and read more here . :+1:
以上就是本文的全部内容,希望本文的内容对大家的学习或者工作能带来一定的帮助,也希望大家多多支持 码农网
猜你喜欢:本站部分资源来源于网络,本站转载出于传递更多信息之目的,版权归原作者或者来源机构所有,如转载稿涉及版权问题,请联系我们。
Metasploit渗透测试魔鬼训练营
诸葛建伟、陈力波、田繁、孙松柏、等 / 机械工业出版社 / 2013-9-1 / 89.00元
首本中文原创Metasploit渗透测试著作,国内信息安全领域布道者和资深Metasploit渗透测试专家领衔撰写,极具权威性。以实践为导向,既详细讲解了Metasploit渗透测试的技术、流程、方法和技巧,又深刻阐释了渗透测试平台背后蕴含的思想。 本书是Metasploit渗透测试领域难得的经典佳作,由国内信息安全领域的资深Metasploit渗透测试专家领衔撰写。内容系统、广泛、有深度,......一起来看看 《Metasploit渗透测试魔鬼训练营》 这本书的介绍吧!