A Beginner’s Guide to Machine Learning Model Monitoring

栏目: IT技术 · 发布时间: 4年前

内容简介:There are several metrics that you can use to monitor an ML model. The metric(s) you choose depends on various factors:Below are various metrics that are commonly used in model monitoring:Also known as a

Metrics in Model Monitoring

There are several metrics that you can use to monitor an ML model. The metric(s) you choose depends on various factors:

  • Is it a regression or classification task?
  • What is the business objective? Eg. precision vs recall
  • What is the distribution of the target variable?

Below are various metrics that are commonly used in model monitoring:

Type 1 Error

Also known as a false positive , it is an outcome where the model incorrectly predicts the positive class. For example, a pregnancy test with a positive outcome, when you aren’t pregnant is an example of a type 1 error.

Type 2 Error

Also known as a false negative , it is an outcome where the model incorrectly predicts the negative class. An example of this is when a result says that you don’t have cancer when you actually do.

Accuracy

The accuracy of a model is simply equal to the fraction of predictions that a model got right and is represented by the following equation:

Precision

Precision attempts to answer “What proportion of positive identifications was actually correct?” and can be represented by the following equation:

Recall

Recall attempts to answer “What proportion of actual positives was identified correctly?” and can be represented by the following equation:

F1 score

The F1 score is a measure of a test’s accuracy — it is the harmonic mean of precision and recall. It can have a maximum score of 1 (perfect precision and recall) and a minimum of 0. Overall, it is a measure of the preciseness and robustness of your model and can be represented with the following equation:

R-Squared

R Squared is a measurement that tells you to what extent the proportion of variance in the dependent variable is explained by the variance in the independent variables. In simpler terms, while the coefficients estimate trends, R-squared represents the scatter around the line of best fit.

For example, if the R² is 0.80, then 80% of the variation can be explained by the model’s inputs.

If the R² is 1.0 or 100%, that means that all movements of the dependent variable can be entirely explained by the movements of the independent variables.

Adjusted R-Squared

Every additional independent variable added to a model always increases the R² value — therefore, a model with several independent variables may seem to be a better fit even if it isn’t. This is where Adjusted R² comes in. The adjusted R² compensates for each additional independent variable and only increases if each given variable improves the model above what is possible by probability.

Mean Absolute Error (MAE)

The absolute error is the difference between the predicted values and the actual values. Thus, the mean absolute error is the average of the absolute error.

Mean Squared Error (MSE)

The mean squared error or MSE is similar to the MAE, except you take the average of the squared differences between the predicted values and the actual values.

Because the differences are squared, larger errors are weighted more highly, and so this should be used over the MAE when you want to minimize large errors. Below is the equation for MSE, as well as the code.

O verall, the metric(s) that you choose to monitor ultimately depends on the task at hand, and the business context that you’re working in.

For example, it’s common knowledge in the data science world that accuracy metrics are irrelevant when it comes to fraud detection models because the percentage of fraudulent transactions is usually less than 1%. Therefore, even if a fraudulent detection model has an accuracy of 99% because it classifies all transactions as non-fraudulent, that doesn’t help us determine whether the model is effective or not.

Another example is that the severity of a false negative classification when it comes to cancer screening tests is much worse than a false positive classification. Saying that a patient with cancer doesn’t have cancer can ultimately lead to his or her death. This is much worse than saying that a patient has cancer, conducting further tests, only to realize that the patient does not have cancer. (It’s always better to be safe than sorry!)


以上就是本文的全部内容,希望对大家的学习有所帮助,也希望大家多多支持 码农网

查看所有标签

猜你喜欢:

本站部分资源来源于网络,本站转载出于传递更多信息之目的,版权归原作者或者来源机构所有,如转载稿涉及版权问题,请联系我们

千夫所指

千夫所指

乔恩·罗森 / 王岑卉 / 九州出版社 / 2016-10-1 / CNY 42.80

编辑推荐: 《乌合之众》是为了跪舔权贵?《普通心理学》实验存在重大漏洞?《引爆点》的理论都是瞎掰的?社交网络时代《1984》预言的“老大哥”是否已经变成事实? 《纽约时报》年度十佳书 《GQ》杂志年度十佳书 《卫报》年度十佳书 《泰晤士报》年度十佳书 《经济学人》年度重推! 黑天鹅年度重点图书! 《乌合之众》是为了迎合权贵?《普通心理学》实验存在重大......一起来看看 《千夫所指》 这本书的介绍吧!

CSS 压缩/解压工具
CSS 压缩/解压工具

在线压缩/解压 CSS 代码

XML、JSON 在线转换
XML、JSON 在线转换

在线XML、JSON转换工具

Markdown 在线编辑器
Markdown 在线编辑器

Markdown 在线编辑器