内容简介:What they mean and when they are usefulROC (receiver operating characteristics) curve and AOC (area under the curve) are performance measures that provide a comprehensive evaluation of classification models.ROC curvesummarizes the performance by combining
ROC Curve and AUC — Explained
What they mean and when they are useful
ROC (receiver operating characteristics) curve and AOC (area under the curve) are performance measures that provide a comprehensive evaluation of classification models.
ROC curvesummarizes the performance by combining confusion matrices at all threshold values. AUC turns the ROC curve into a numeric representation of performance for a binary classifier. AUC is the area under the ROC curve and takes a value between 0 and 1. AUC indicates how successful a model is at separating positive and negative classes.
Before going in detail, let’s first explain the confusion matrix and how different threshold values change the outcome of it.
A confusion matrix is not a metric to evaluate a model, but it provides insight into the predictions. Confusion matrix goes deeper than classification accuracy by showing the correct and incorrect (i.e. true or false) predictions on each class. In case of a binary classification task, a confusion matrix is a 2×2 matrix. If there are three different classes, it is a 3×3 matrix and so on.
Let’s assume class A is positive class and class B is negative class. The key terms of confusion matrix are as follows:
- True positive (TP) : Predicting positive class as positive (ok)
- False positive (FP) : Predicting negative class as positive (not ok)
- False negative (FN) : Predicting positive class as negative (not ok)
- True negative (TN) : Predicting negative class as negative (ok)
Algorithms like logistic regression return probabilities rather than discrete outputs. We set a threshold value on the probabilities to distinguish positive and negative class. Depending on the threshold value, the predicted class of some observations may change.
As we can see from the image above, adjusting the threshold value changes the prediction and thus results in a different confusion matrix. When the elements in a confusion matrix change, precision and recall also change.
Precision and recall metrics take the classification accuracy one step further and allow us to get a more specific understanding of model evaluation.
The focus of precision is positive predictions . It indicates how many of the positive predictions are true.
The focus of recall is actual positive classes . It indicates how many of the positive classes the model is able to predict correctly.
Note: We cannot try to maximize both precision and recall because there is a trade-off between them. Increasing precision decreases recall and vice versa. We can aim to maximize precision or recall depending on the task. For an email spam detection model, we try to maximize precision because we want to be correct when an email is detected as spam. We do not want to label a normal email as spam (i.e. false positive). On the other hand, for a tumor detection task, we need to maximize recall because we want to detect positive classes as much as possible.
What ROC curve does is providing us a summary of the performance of a model by combining confusion matrices at all threshold values.
ROC curve has two axes both of which take values between 0 and 1. Y-axis is true positive rate (TPR) which is also known as sensitivity . It is the same as recall which measures the proportion of positive class that is correctly predicted as positive. X-axis is false positive rate (FPR). It is equal to 1-specificity which is similar to sensitivity but focused on negative class. Specificity measures the proportion of negative class that is correctly predicted as negative.
If the threshold is set to 0, the model predicts all samples as positive. Thus, we only have true positives and false positives. In this case, Both TPR and FPR are 1. If the threshold is set to 1, we do not have any positive predictions. In this case TP and FP are 0 and so TPR and FPR become 0. Hence, it is not a good choice to set the threshold to 0 or 1.
We aim to increase the true positive rate (TPR) while keeping false positive rate (FPR) low. As we can see on the ROC curve, as TPR increases, FPR also increases. So it comes down to decide how many false positives we can tolerate.
ROC curve gives as an overview of model performance at different threshold values. AUC is the area under ROC curve between (0,0) and (1,1) which can be calculated using integral calculus. AUC basically aggregates the performance of the model at all threshold values. The best possible value of AUC is 1 which indicates a perfect classifier. AUC is zero if all the predictions are wrong.
Note: AUC is not dependent on classification threshold value. Changing the threshold value does not change AUC because it is an aggregate measure of ROC.
The figure above shows the ROC curves for classifiers A and B. A is clearly a better classifier than B. The AUC is higher and for same FPR values, A has a higher TPR. Similarly, for same TPR values, A has a smaller FPR.
AUC is classification-threshold invariant. For this very reason, it is not the optimal metric of evaluation for certain tasks. For instance, when working on email spam detection, we do not want to have any false positives. On the other hand, we cannot afford to have a false negative for tumor detection tasks. We can optimize the model to our needs by adjusting the classification threshold value in such cases. Since AUC is not affected by threshold value, it is not a good metric choice. Precision or recall should be used as evaluation metric for those cases.
以上就是本文的全部内容,希望对大家的学习有所帮助,也希望大家多多支持 码农网
猜你喜欢:本站部分资源来源于网络,本站转载出于传递更多信息之目的,版权归原作者或者来源机构所有,如转载稿涉及版权问题,请联系我们。
Head First Web Design
Ethan Watrall、Jeff Siarto / O’Reilly Media, Inc. / 2009-01-02 / USD 49.99
Want to know how to make your pages look beautiful, communicate your message effectively, guide visitors through your website with ease, and get everything approved by the accessibility and usability ......一起来看看 《Head First Web Design》 这本书的介绍吧!