More Performance Evaluation Metrics You Should Know for Classification Problems

栏目: IT技术 · 发布时间: 5年前

内容简介:Precisionis the ratio ofLow precision: more the number of False positives the model predicts lesser the precision.Recall (Sensitivity)is the ratio of

The equations of 4 key classification metrics

Recall versus Precision

Precisionis the ratio of True Positives to all the positives predicted by the model.

Low precision: more the number of False positives the model predicts lesser the precision.

Recall (Sensitivity)is the ratio of True Positives to all the positives in your Dataset.

Low recall: more the number of False Negatives the model predicts lesser the recall.

The idea of recall and precision seems to be abstract. Let me illustrate the difference in three real cases.
  • the result of TP will be that the COVID 19 residents diagnosed with COVID-19.
  • the result of TN will be that healthy residents are with good health.
  • the result of FP will be that those actually healthy residents are predicted as COVID 19 residents.
  • the result of FN will be that those actual COVID 19 residents are predicted as the healthy residents

In case 1, which scenario do you think will have the highest cost?

Imagine that if we predict COVID 19 residents as healthy patients and they do not need to quarantine, there would be a massive number of COVID 19 infection. The cost of f alse negative is much higher the cost of f alse positives.

  • the result of TP will be that spam emails are placed in the spam folder.
  • the result of TN will be that important emails are received.
  • the result of FP will be that important emails are placed in the spam folder.
  • the result of FN will be that spam emails are received.

In case 2, which scenario do you think will have the highest cost?

Well, since missing important emails will clearly be more of a problem than receiving spam, we can say that in this case, FP will have a higher cost than FN.

  • the result of TP will be that bad loans are correctly predicted as bad loans.
  • the result of TN will be that good loans are correctly predicted as good loans.
  • the result of FP will be that (actual) good loans are incorrectly predicted as bad loans.
  • the result of FN will be that (actual) bad loans are incorrectly predicted as good loans.

In case 3, which scenario do you think will have the highest cost?

The banks would lose a bunch amount of money if the actual bad loans are predicted as good loans due to loans not being repaid. In other hands, banks wont be able to make more revenue if the actual good loans are predicted as bad loans. Therefore, the cost of False Negatives is much higher the cost of False Positives. Imagine that

Summary

In practice, the cost of false negative is not the same as the cost of false positive depending on the different specific cases. It is evident that not only should we calculate accuracy, but we should also evaluate our model using other metrics, for example, Recall and Precision .


以上就是本文的全部内容,希望对大家的学习有所帮助,也希望大家多多支持 码农网

查看所有标签

猜你喜欢:

本站部分资源来源于网络,本站转载出于传递更多信息之目的,版权归原作者或者来源机构所有,如转载稿涉及版权问题,请联系我们

可视化未来

可视化未来

[美] 埃雷兹·艾登、[法] 让-巴蒂斯特·米歇尔 / 王彤彤、沈华伟、程学旗 / 浙江人民出版社 / 2015-9 / 54.90元

科学的传播速度有多快?今时今日我们很少谈论上帝了吗?人们什么时候开始用“having sex” 而不用“making love”? 史上的人是在哪岁成名的?语法的变化速度到底有多快?哪些作家被纳粹审查得最彻底? “donut” 什么时候开始取代“doughnut”? 我 们能否预测人类未来?比尔·克林顿和花椰菜哪个更出名? 《可视化未来》一书的两位作者通过与“谷歌图书”的合作,得以有机会研究......一起来看看 《可视化未来》 这本书的介绍吧!

Markdown 在线编辑器
Markdown 在线编辑器

Markdown 在线编辑器

HEX CMYK 转换工具
HEX CMYK 转换工具

HEX CMYK 互转工具

HEX HSV 转换工具
HEX HSV 转换工具

HEX HSV 互换工具