A Beginner’s Guide to Machine Learning Model Monitoring

栏目: IT技术 · 发布时间: 4年前

内容简介:There are several metrics that you can use to monitor an ML model. The metric(s) you choose depends on various factors:Below are various metrics that are commonly used in model monitoring:Also known as a

Metrics in Model Monitoring

There are several metrics that you can use to monitor an ML model. The metric(s) you choose depends on various factors:

  • Is it a regression or classification task?
  • What is the business objective? Eg. precision vs recall
  • What is the distribution of the target variable?

Below are various metrics that are commonly used in model monitoring:

Type 1 Error

Also known as a false positive , it is an outcome where the model incorrectly predicts the positive class. For example, a pregnancy test with a positive outcome, when you aren’t pregnant is an example of a type 1 error.

Type 2 Error

Also known as a false negative , it is an outcome where the model incorrectly predicts the negative class. An example of this is when a result says that you don’t have cancer when you actually do.

Accuracy

The accuracy of a model is simply equal to the fraction of predictions that a model got right and is represented by the following equation:

Precision

Precision attempts to answer “What proportion of positive identifications was actually correct?” and can be represented by the following equation:

Recall

Recall attempts to answer “What proportion of actual positives was identified correctly?” and can be represented by the following equation:

F1 score

The F1 score is a measure of a test’s accuracy — it is the harmonic mean of precision and recall. It can have a maximum score of 1 (perfect precision and recall) and a minimum of 0. Overall, it is a measure of the preciseness and robustness of your model and can be represented with the following equation:

R-Squared

R Squared is a measurement that tells you to what extent the proportion of variance in the dependent variable is explained by the variance in the independent variables. In simpler terms, while the coefficients estimate trends, R-squared represents the scatter around the line of best fit.

For example, if the R² is 0.80, then 80% of the variation can be explained by the model’s inputs.

If the R² is 1.0 or 100%, that means that all movements of the dependent variable can be entirely explained by the movements of the independent variables.

Adjusted R-Squared

Every additional independent variable added to a model always increases the R² value — therefore, a model with several independent variables may seem to be a better fit even if it isn’t. This is where Adjusted R² comes in. The adjusted R² compensates for each additional independent variable and only increases if each given variable improves the model above what is possible by probability.

Mean Absolute Error (MAE)

The absolute error is the difference between the predicted values and the actual values. Thus, the mean absolute error is the average of the absolute error.

Mean Squared Error (MSE)

The mean squared error or MSE is similar to the MAE, except you take the average of the squared differences between the predicted values and the actual values.

Because the differences are squared, larger errors are weighted more highly, and so this should be used over the MAE when you want to minimize large errors. Below is the equation for MSE, as well as the code.

O verall, the metric(s) that you choose to monitor ultimately depends on the task at hand, and the business context that you’re working in.

For example, it’s common knowledge in the data science world that accuracy metrics are irrelevant when it comes to fraud detection models because the percentage of fraudulent transactions is usually less than 1%. Therefore, even if a fraudulent detection model has an accuracy of 99% because it classifies all transactions as non-fraudulent, that doesn’t help us determine whether the model is effective or not.

Another example is that the severity of a false negative classification when it comes to cancer screening tests is much worse than a false positive classification. Saying that a patient with cancer doesn’t have cancer can ultimately lead to his or her death. This is much worse than saying that a patient has cancer, conducting further tests, only to realize that the patient does not have cancer. (It’s always better to be safe than sorry!)


以上就是本文的全部内容,希望对大家的学习有所帮助,也希望大家多多支持 码农网

查看所有标签

猜你喜欢:

本站部分资源来源于网络,本站转载出于传递更多信息之目的,版权归原作者或者来源机构所有,如转载稿涉及版权问题,请联系我们

UML参考手册

UML参考手册

兰博 / UML China / 机械工业出版社 / 2005-8 / 75.00元

《UML参考手册》在第1版的基础上进行了重大更新和扩展。UML的创建者James Rumbaugh、Ivar Jacobson和Grady Booch,清晰完整地讲述了UML的所有概念,包括对序列图、活动模型、状态机、组件、类和组件的内部结构以及特性描述的主要修订。手册式结构不仅有助于读者对UML的概念进行规范化的学习与理解,更为广大程序开发人员、系统用户和工程技术人员提供了方便快捷的查询方式。无......一起来看看 《UML参考手册》 这本书的介绍吧!

Base64 编码/解码
Base64 编码/解码

Base64 编码/解码

RGB HSV 转换
RGB HSV 转换

RGB HSV 互转工具