A Gentle Introduction to Degrees of Freedom in Machine Learning

栏目: IT技术 · 发布时间: 4年前

内容简介：Degrees of freedom is an important concept from statistics and engineering.It is often employed to summarize the number of values used in the calculation of a statistic, such as a sample statistic or in a statistical hypothesis test.In machine learning, th

Degrees of freedom is an important concept from statistics and engineering.

It is often employed to summarize the number of values used in the calculation of a statistic, such as a sample statistic or in a statistical hypothesis test.

In machine learning, the degrees of freedom may refer to the number of parameters in the model, such as the number of coefficients in a linear regression model or the number of weights in a deep learning neural network.

The concern is that if there are more degrees of freedom (model parameters) in machine learning, then the model is expected to overfit the training dataset. This is the common understanding from statistics. This expectation can be overcome through the use of regularization techniques, such as regularization linear regression and the suite of regularization methods available for deep learning neural network models.

In this post, you will discover degrees of freedom in statistics and machine learning.

After reading this post, you will know:

Degrees of freedom generally represents the number of points of control of a system.
In statistics, degrees of freedom is the number of observations used to calculate a statistic.
In machine learning, degrees of freedom is the number of parameters of a model.

Let’s get started.

A Gentle Introduction to Degrees of Freedom in Machine Learning

Photo by daveynin , some rights reserved.

Overview

This tutorial is divided into three parts; they are:

Degrees of Freedom
Degrees of Freedom in Statistics
Degrees of Freedom in Machine Learning
1. Degrees of Freedom for a Linear Regression Model
2. Degrees of Freedom for Linear Regression Error
3. Total Degrees of Freedom for Linear Regression
4. Negative Degrees of Freedom
5. Degrees of Freedom and Overfitting

Degrees of Freedom

Degrees of freedom represent the number of points of control of a system, model, or calculation.

Each independent parameter that can change is a separate dimension in a d-dimensional space that defines the scope of values that may influence the system, where the specific observed or specified values are a single point in that space.

Mathematically, the degrees of freedom is often represented using the Greek letter nu, which looks like a lower-case “v”.

It may also be abbreviated as “d.o.f,” “dof,” “d.f.,” or simply “df.”

Degrees of freedom is a term from statistics and engineering and may be used in machine learning.

Degrees of Freedom in Statistics

In statistics, the degrees of freedom is the number of values used in the calculation of a statistic that can change.

Degrees of freedom: Roughly, the minimum amount of data needed to calculate a statistic. More practically, it is a number, or numbers, used to approximate the number of observations in the data set for the purpose of determining statistical significance.

— Page 60, Statistics in Plain English , 3rd Edition, 2010.

It is calculated as the number of independent values used in the calculation of the statistic minus the number of statistics calculated.

degrees of freedom = number of independent values – number of statistics

For example, we may have 50 independent samples and we wish to calculate a statistic of the sample, like the mean. All 50 samples are used in the calculation and there is one statistic, so the number of degrees of freedom for the mean, in this case, is calculated as:

degrees of freedom = number of independent values – number of statistics
degrees of freedom = 50 – 1
degrees of freedom = 49

Degrees of freedom is often an important consideration in data distributions and statistical hypothesis tests . For example, it used to be common to have tables of statistical test critical values calculated for different common degrees of freedom (before calculating the statistic directly was easy and common).

So far, so good, but what about a model fit from data, such as in machine learning?

Degrees of Freedom in Machine Learning

In predictive modeling, the degrees of freedom often refers to the number of parameters in the model that are estimated from data.

This can also include both the coefficients of the model and the data used in the calculation of the error of the model.

The best case for understanding this is with a linear regression model.

Degrees of Freedom for a Linear Regression Model

Consider a linear regression model for a dataset that has two input variables.

We will require one coefficient in the model for each of the input variables, e.g. the model will have two parameters.

This model looks as follows, where x1 and x2 are the input variables and beta1 and beta2 are the model parameters.

yhat = x1 * beta1 + x2 * beta2

This linear regression model has two degrees of freedom because there are two parameters in the model that must be estimated from a training dataset. Adding one more column to the data (one more input variable) would add one more degree of freedom for the model.

model degrees of freedom = number of parameters estimated from data

It is common to describe the complexity of a model fit from data based on the number of parameters that were fit.

For example, the complexity of a linear regression model with two parameters is equal to the degrees of freedom, which in this case is 2. We often prefer lower complexity models over higher complexity models. Simpler models generalize better.

The degrees of freedom are an accounting of how many parameters are estimated by the model and, by extension, a measure of complexity for linear regression models.

— Page 71, Applied Predictive Modeling , 2013.

It’s not over yet.

Degrees of Freedom for Linear Regression Error

The number of training examples matters and impacts the overall degrees of freedom for the regression model.

Consider that the coefficients of the linear regression model are fit using a training dataset with 100 rows or examples.

The model is fit by minimizing the error between the model predictions and the expected output values. The total error of the model has one degree of freedom for each example in the training dataset minus the number of parameters estimated from the data.

In this case, the model error has 100 minus 2 parameters from the model, or 98 degrees of freedom.

model error degrees of freedom = number of observations – number of parameters
model error degrees of freedom = 100 – 2
model error degrees of freedom = 98

It is often good practice to report the error of a linear model, like linear regression, including the degrees of freedom of the error.

At the very least, the number of observations in the training data can be included so that the model error degrees of freedom can be determined.

Total Degrees of Freedom for Linear Regression

The total degrees of freedom for the linear regression model is taken as the sum of the model degrees of freedom plus the model error degrees of freedom.

linear regression degrees of freedom = model degrees of freedom + model error degrees of freedom
linear regression degrees of freedom = 2 + 98
linear regression degrees of freedom = 100

Generally, the degrees of freedom is equal to the number of rows of training data used to fit the model.

Consider a dataset with 100 rows of data as before, but now we have 70 input variables.

This means that the model has 70 coefficients or parameters fit from the data. The model error would therefore be 100 – 70, or 30 degrees of freedom.

The total degrees of freedom for the model is still equal to the number of rows, or 70 + 30.

Negative Degrees of Freedom

What happens when we have more columns than rows of data?

For example, we may have 100 rows of data and 10,000 variables, such as gene markers for 100 patients.

A linear regression model would therefore have 10,000 parameters, meaning the model would have 10,000 degrees of freedom.

We can calculate the model error degrees of freedom as follows:

model error degrees of freedom = number of observations – number of parameters
model error degrees of freedom = 100 – 10,000
model error degrees of freedom = -9,900

Uh oh.

And we can calculate the total degrees of freedom as follows:

linear regression degrees of freedom = model degrees of freedom + model error degrees of freedom
linear regression degrees of freedom = 10,000 + -9,900
linear regression degrees of freedom = 100

The model has 100 total degrees of freedom, but the model error has a negative degrees of freedom.

A negative degree of freedom is valid.

It suggests that we have more statistics than we have values that can change. In this case, we have more parameters in the model than we have rows of data or observations to train the model.

This is a so-called p >> n or having many more predictors p than we do samples n .

Degrees of Freedom and Overfitting

The problem is that when we have more parameters than observations, there is a risk of overfitting the training dataset.

This is intuitive if we think of each coefficient in the model as a point of control. If we have more points of control in the model than we have observations, we can, in theory, configure the model to predict the training dataset correctly and exactly. Learning the details of the training dataset at the expense of performing well on new data is the definition of overfitting.

This is the general concern that statisticians have about deep learning neural network models.

That is, deep learning models often have many more parameters (model weights) than samples (e.g. billions of weights), and using our understanding of linear models, are expected to overfit.

Nevertheless, through careful selection of model architectures and regularization techniques, they can be prevented from overfitting and maintain low generalization error.

Further, in deep models, the effective degrees of freedom may be decoupled from the number of parameters in the model.

We showed that for simple classification models, degrees of freedom is equal to the number of parameters in the model. In deep networks, the degrees of freedom is generally much less than the number of parameters in the model, and deeper networks tend to have less degrees of freedom.

— Degrees of Freedom in Deep Neural Networks , 2016.

As such, there is a growing trend by statisticians and machine learning practitioners to move away from degrees of freedom for both a proxy for model complexity and as an expectation for overfitting.

To most applied statisticians, a fitting procedure’s degrees of freedom is synonymous with its model complexity, or its capacity for overfitting to data. […] We argue that, on the contrary, model complexity and degrees of freedom may correspond very poorly.

— Effective Degrees Of Freedom: A Flawed Metaphor , 2013.

Summary

In this post, you discovered degrees of freedom in statistics and machine learning.

Specifically, you learned:

Degrees of freedom generally represents the number of points of control of a system.
In statistics, degrees of freedom is the number of observations used to calculate a statistic.
In machine learning, degrees of freedom is the number of parameters of a model.

Do you have any questions?

Ask your questions in the comments below and I will do my best to answer.

Get a Handle on Statistics for Machine Learning!

Develop a working understanding of statistics

...by writing lines of code in python

Discover how in my new Ebook:

Statistical Methods for Machine Learning

It provides self-study tutorials on topics like:

Hypothesis Tests, Correlation, Nonparametric Stats, Resampling , and much more...

Discover how to Transform Data into Knowledge

Skip the Academics. Just Results.

See What's Inside

以上所述就是小编给大家介绍的《A Gentle Introduction to Degrees of Freedom in Machine Learning》，希望对大家有所帮助，如果大家有任何疑问请给我留言，小编会及时回复大家的。在此也非常感谢大家对码农网的支持！

查看所有标签

猜你喜欢:

A Gentle Introduction to Degrees of Freedom in Machine Learning

本站部分资源来源于网络，本站转载出于传递更多信息之目的，版权归原作者或者来源机构所有，如转载稿涉及版权问题，请联系我们。

码农书籍

C++编程风格

卡吉尔 / 聂雪军 / 机械工业出版社发行室 / 2007-1 / 25.00元

本书描述C++语言中较深层次的程序设计思想和使用方法，包含大量软件工程概念和设计模式，重点介绍大规模编程相关的内容，例如增加代码的可读性、可维护性、可扩展性以及执行效率等的方法。本书的示例代码都是从实际程序中抽取出来的，融人了作者的实际开发经验。讲解如何正确地编写代码以及避开一些常见的误区和陷阱，并给出了许多实用的编程规则，可快速提升读者的C++编程功力。　　本书描述平实，示例丰富，适合有......一起来看看《C++编程风格》这本书的介绍吧!

码农工具

A Gentle Introduction to Degrees of Freedom in Machine Learning

Overview

Degrees of Freedom

Degrees of Freedom in Statistics

Degrees of Freedom in Machine Learning

Degrees of Freedom for a Linear Regression Model

Degrees of Freedom for Linear Regression Error

Total Degrees of Freedom for Linear Regression

Negative Degrees of Freedom

Uh oh.

Degrees of Freedom and Overfitting

Further Reading

Papers

Books

Articles

Summary

Get a Handle on Statistics for Machine Learning!

Develop a working understanding of statistics

Discover how to Transform Data into Knowledge

C++编程风格

JSON 在线解析

正则表达式在线测试

RGB CMYK 转换工具