A Beginner’s Guide to Machine Learning Model Monitoring

栏目: IT技术 · 发布时间: 4年前

内容简介:There are several metrics that you can use to monitor an ML model. The metric(s) you choose depends on various factors:Below are various metrics that are commonly used in model monitoring:Also known as a

Metrics in Model Monitoring

There are several metrics that you can use to monitor an ML model. The metric(s) you choose depends on various factors:

  • Is it a regression or classification task?
  • What is the business objective? Eg. precision vs recall
  • What is the distribution of the target variable?

Below are various metrics that are commonly used in model monitoring:

Type 1 Error

Also known as a false positive , it is an outcome where the model incorrectly predicts the positive class. For example, a pregnancy test with a positive outcome, when you aren’t pregnant is an example of a type 1 error.

Type 2 Error

Also known as a false negative , it is an outcome where the model incorrectly predicts the negative class. An example of this is when a result says that you don’t have cancer when you actually do.

Accuracy

The accuracy of a model is simply equal to the fraction of predictions that a model got right and is represented by the following equation:

Precision

Precision attempts to answer “What proportion of positive identifications was actually correct?” and can be represented by the following equation:

Recall

Recall attempts to answer “What proportion of actual positives was identified correctly?” and can be represented by the following equation:

F1 score

The F1 score is a measure of a test’s accuracy — it is the harmonic mean of precision and recall. It can have a maximum score of 1 (perfect precision and recall) and a minimum of 0. Overall, it is a measure of the preciseness and robustness of your model and can be represented with the following equation:

R-Squared

R Squared is a measurement that tells you to what extent the proportion of variance in the dependent variable is explained by the variance in the independent variables. In simpler terms, while the coefficients estimate trends, R-squared represents the scatter around the line of best fit.

For example, if the R² is 0.80, then 80% of the variation can be explained by the model’s inputs.

If the R² is 1.0 or 100%, that means that all movements of the dependent variable can be entirely explained by the movements of the independent variables.

Adjusted R-Squared

Every additional independent variable added to a model always increases the R² value — therefore, a model with several independent variables may seem to be a better fit even if it isn’t. This is where Adjusted R² comes in. The adjusted R² compensates for each additional independent variable and only increases if each given variable improves the model above what is possible by probability.

Mean Absolute Error (MAE)

The absolute error is the difference between the predicted values and the actual values. Thus, the mean absolute error is the average of the absolute error.

Mean Squared Error (MSE)

The mean squared error or MSE is similar to the MAE, except you take the average of the squared differences between the predicted values and the actual values.

Because the differences are squared, larger errors are weighted more highly, and so this should be used over the MAE when you want to minimize large errors. Below is the equation for MSE, as well as the code.

O verall, the metric(s) that you choose to monitor ultimately depends on the task at hand, and the business context that you’re working in.

For example, it’s common knowledge in the data science world that accuracy metrics are irrelevant when it comes to fraud detection models because the percentage of fraudulent transactions is usually less than 1%. Therefore, even if a fraudulent detection model has an accuracy of 99% because it classifies all transactions as non-fraudulent, that doesn’t help us determine whether the model is effective or not.

Another example is that the severity of a false negative classification when it comes to cancer screening tests is much worse than a false positive classification. Saying that a patient with cancer doesn’t have cancer can ultimately lead to his or her death. This is much worse than saying that a patient has cancer, conducting further tests, only to realize that the patient does not have cancer. (It’s always better to be safe than sorry!)


以上就是本文的全部内容,希望对大家的学习有所帮助,也希望大家多多支持 码农网

查看所有标签

猜你喜欢:

本站部分资源来源于网络,本站转载出于传递更多信息之目的,版权归原作者或者来源机构所有,如转载稿涉及版权问题,请联系我们

网络多人游戏架构与编程

网络多人游戏架构与编程

格雷泽 (Joshua Glazer)、马达夫 (Sanjay Madhav) / 王晓慧、张国鑫 / 人民邮电出版社 / 2017-10-1 / CNY 109.00

本书是一本深入探讨关于网络多人游戏编程的图书。 全书分为13章,从网络游戏的基本概念、互联网、伯克利套接字、对象序列化、对象复制、网络拓扑和游戏案例、延迟、抖动和可靠性、改进的延迟处理、可扩展性、安全性、真实世界的引擎、玩家服务、云托管专用服务器等方面深入介绍了网络多人游戏开发的知识,既全面又详尽地剖析了众多核心概念。 本书的多数示例基于C++编写,适合对C++有一定了解的读者阅读。本......一起来看看 《网络多人游戏架构与编程》 这本书的介绍吧!

JSON 在线解析
JSON 在线解析

在线 JSON 格式化工具

图片转BASE64编码
图片转BASE64编码

在线图片转Base64编码工具

Base64 编码/解码
Base64 编码/解码

Base64 编码/解码