Quick Bias/Variance Trade-Off

栏目: IT技术 · 发布时间: 4年前

内容简介:This post will explain one of the most common issues in Machine Learning: The Bias/Variance Trade-off. We will see what it is, why it’s important to take it into account when building a Machine Learning model, and we will explain it intuitively and with ze

The Bias/Variance trade-off easily explained

Mar 9 ·7min read

This post will explain one of the most common issues in Machine Learning: The Bias/Variance Trade-off. We will see what it is, why it’s important to take it into account when building a Machine Learning model, and we will explain it intuitively and with zero math.

Quick Bias/Variance Trade-Off

Image from Unsplash

What is the Bias/Variance trade-off?

As stated above, the bias/variance trade-off is one of the most common issues that has to be addressed when building an application that will use a supervised Machine Learning model to make predictions.

It is a problem that has to do with the error of our models and also with their flexibility. The overall problem is denoted as a trade-off because generally it is not possible to improve the bias and variance of our models at the same time: usually when one goes down, the other goes up, and vice versa.

Why is the Bias/Variance trade-off important?

We want our Machine learning models to be as accurate as possible when put into production. This means that it is important to reduce the possible errors it might make, an error which has three main terms:

Source: Codecogs

This formula means the following: the overall error of our model can be divided into three terms; the error due to variance , the error due to bias and the irreducible error . This last one is the error that can’t be reduced by playing with our models or data, and it’s generally due to noise in the data or because the model's performance can’t be increased any further (i.e it has reached the same performance as the top human experts on a specific task).

Knowing this, it is obvious that we have two ways of reducing the overall error: as we can’t reduce the irreducible error we must reduce the errors that come from variance or bias.

Another way to see these errors is the following: the difference between the human level performance on some task (errors made by the humans who labelled the data), and the error our model makes on this training data is the error due to the bias of our model. The difference in error between our training error and the error on our test data is the error our model makes due to variance.

Quick Bias/Variance Trade-Off

The difference between human error and training error is the bias error, and the difference between the training error and test error is the variance error. Self made Figure.

Let's see where these errors come from, how we can reduce them, and also speak about their trade-off.

The Bias/Variance trade-off explained intuitively

Alright, we’ve briefly described the bias/variance trade-off, let's see what each of these terms mean and then describe the problem in depth:

  • Bias: The bias of our model has to do with the assumptions that it makes about the data, and how well it fits to it when it is trained. A model with high bias doesn’t fit well the training data, has limited flexibility, or is extremely simple for the data that we have, resulting generally on a high training error.

The bias tells us how well our model approximates reality.

  • Variance: The variance of our model has to do with how it varies its results depending on the sample of data that it uses for its training. A model with high variance can fit specific to data well, so it has problems generalising to unseen data, resulting on a high test error.

The variance tells us how sensible our model is to the training data.

The following figure is an image that is normally used for illustrating what variance and bias are:

Quick Bias/Variance Trade-Off

Illustration of the Bias/Variance bullseye. Icon from Flaticon .

The explanation of this figure is the following: each dartboard gives us an idea of how well our model performs; the red crosses represent predictions, that are better when they are closer to the bulls-eye (centre of the board).

  • When we have high variance and high bias our predictions are very spread out and not close to the centre. Our model does not make good predictions on any data samples.
  • When we have high variance and low bias our predictions are spread out but around the centre of the board, so some of them hit the bullseye but others don’t. On some data, our model predicts well but on other data samples it doesn’t.
  • High bias and low variance means that our predictions are close together, but not near the centre of the board. Generally our model does not predict well, although it makes similar predictions on different samples.
  • Low variance and low bias means that our predictions are close together and centred: this is the best scenario, where our model predicts well for all kinds of data.

Let's see an example on a real world application now, to finish off acquiring an intuition of the problem.

Imagine we are building an application for recognising cats in images, if we train a model with high bias , it would predict cat images very badly, independently of the samples of the cat data you train it with.

Quick Bias/Variance Trade-Off

Our first model, with high bias, would predict this dog as a cat. Original Image from Unsplash .

A model with high variance would predict well the specific cat species (for example) it was trained with, but it would make errors when facing images of cats that are not very close to images it has previously been trained with. It would generalise badly to new cats.

Quick Bias/Variance Trade-Off

Our second model, with high variance, would predict very well some species of cats (top) but badly on others (bottom). Original images from Unsplash .

Lastly, a model with low variance and low bias would predict well independently the data samples it was trained with; in our example, it would generalise enough to interpret when an animal is a cat or a dog, without being tricked by different cat species.

Quick Bias/Variance Trade-Off

Our last model with low variance classifies well different kinds of cats. Original Image from Unsplash .

Examples of models with high bias/variance

Now that we know what bias and variance are, their relation, and intuition, let’s see some examples of models with high variance/bias.

As we said, a model with high bias does not fit well the training data . Models that can suffer from this problem are Linear models for example, as they assume a linear relationship between the features and the target variable that does not always exists.

If you are not familiar with Linear regression, you can learn about it here:

In the following image we can see a linear regression model fit to data that clearly has no linear correlation: as a result our model will have a high bias and not perform very well.

Quick Bias/Variance Trade-Off

A linear model fit to non-linear data. Self-made Image.

Models with high variance are Decision Trees for example: they create specific branches and splits for samples of the training data, that are specific to this data. Moreover, if we let a decision tree grow forever, it will grow as many leave nodes as data samples, creating a specific path for each data point. This means that when it finds a new sample, if it does not exactly match the feature values of any of the samples of the training data, it will not classify it very well.

You can find a simple explanation of Decision Trees in the following article:

This is the main reason while Decision Trees always have some sort of stop condition (Nº leave nodes, minimum samples on leave node, maximum depth…). By doing this, we make our tree generalise better on new data points.

How to fix the bias/variance problem

As we have seen fitting the training data too well results in a high variance but a low bias. Not fitting the training data well results in high bias and low variance. How can we fix the problem of having high bias or high variance?

If we have a high bias (high training error), we can do the following to try to fix the problem:

  • Change the optimisation algorithm of our model.
  • Do better hyper-parameter tuning . (Run a Coarse Grid Search and then a more specific one around the best results from the first one).
  • Switch the model type

If we have high variance (high test error), we can try some of the following solutions:

  • Regularise our algorithm using L1 or L2 regularisation, dropout, tree pruning, etc…
  • Get more data to train on, or try data augmentation techniques.
  • Can also try a different model type .

Conclusion and Other resources

We have seen what the bias/variance trade off is, how it relates to our Machine Learning models, various examples, and how to tackle it. If you want to dive any deeper into it, check out the following resources:

That is all, I hope you liked the post. Feel free to follow me on Twitter at @jaimezorno . Also, you can take a look at my posts on Data Science and Machine Learning here . Have a good read!


以上就是本文的全部内容,希望本文的内容对大家的学习或者工作能带来一定的帮助,也希望大家多多支持 码农网

查看所有标签

猜你喜欢:

本站部分资源来源于网络,本站转载出于传递更多信息之目的,版权归原作者或者来源机构所有,如转载稿涉及版权问题,请联系我们

数据挖掘概念与技术

数据挖掘概念与技术

(加)Jiawei Han;Micheline Kamber / 范明、孟小峰 / 机械工业 / 2007-3 / 55.00元

《数据挖掘概念与技术(原书第2版)》全面地讲述数据挖掘领域的重要知识和技术创新。在第1版内容相当全面的基础上,第2版展示了该领域的最新研究成果,例如挖掘流、时序和序列数据以及挖掘时间空间、多媒体、文本和Web数据。本书可作为数据挖掘和知识发现领域的教师、研究人员和开发人员的一本必读书。 《数据挖掘概念与技术(原书第2版)》第1版曾是受读者欢迎的数据挖掘专著,是一本可读性极佳的教材。第2版充实了数据......一起来看看 《数据挖掘概念与技术》 这本书的介绍吧!

Base64 编码/解码
Base64 编码/解码

Base64 编码/解码

正则表达式在线测试
正则表达式在线测试

正则表达式在线测试

RGB CMYK 转换工具
RGB CMYK 转换工具

RGB CMYK 互转工具