Diving Deeper into Linear Regression

栏目: IT技术 · 发布时间: 6年前

Diving Deeper into Linear Regression

Photo by Artem Verbo on Unsplash

W hen I say “linear regression”, most of the people start thinking about the good old Ordinary Least Square (OLS) regression. If you are not familiar with the term, these equations might help…

Diving Deeper into Linear Regression
β_1, β_2: weights; β_0: bias; J(β): cost function

Did you also think about OLS? If yes then you are on the right track. But there’s more to linear regression than just OLS! First, let us look at OLS a bit more closely.

OLS

The name of this technique came from the cost function. Here, we take the sum of squared errors (the difference between ground truths and predictions) and try to minimize this. By minimizing the cost function we achieve the optimal value of the vector β (contains bias and weights). In the below plot, the contour (concentric ellipses) of the cost function is shown. After the minimization, we get β as the point at the center.

Diving Deeper into Linear Regression

OLS

At first, it seems like OLS is enough for any regression problem. But as we increase the number of features and the complexity of data OLS tends to overfit the training data. The concept of overfitting is vast and deserves a separate article (you can find plenty of them) so I’m going to give you a brief. Overfitting means the model has learned the training data so well that it fails to generalize. In other words, the model has learned even the small scale (insignificant) variations in the train data so it fails to produce good predictions on unseen (validation and test) data. To tackle the problem of overfitting we can use many techniques. Adding a regularization (penalty) term to our cost function is one such technique. But what term should we use? We generally use one of the following two methods.

Ridge

In this case, we add the sum of squares of weights to our least square cost function. So now it looks something like this…

m: 1+dimension of β; λ: regularization parameter

But how does this term prevent overfitting? Adding this term is equivalent to adding an extra constraint on the possible values of β. Because to achieve the minimum cost, the sum of β²_j’s must not exceed a certain value (say r). This technique prevents the model from assigning very large weights to some features over the others thus tackling overfitting. Mathematically,

Diving Deeper into Linear Regression

In other words, β should lie inside(or on) the circle with radius √r centered at the origin. Here’s the visualization…

Diving Deeper into Linear Regression

Ridge

Notice that because of the constraint (red circle), the final value of β is closer to the origin than it was in the OLS.

Lasso

The only difference between Ridge and Lasso is the regularization term. Here, we add the sum of absolute values of the weights to our least square cost function. So the cost function becomes…

In this case, the constraint can be written as…

Diving Deeper into Linear Regression

Now we can visualize the constraint as a square instead of a circle.

Diving Deeper into Linear Regression

Lasso

It is worth noting that, if the contours hit a corner of the square then one feature is completely neglected (weight becomes 0). For higher-dimensional feature space, we can use this trick to reduce the number of features.

Note: In the regularization term we are not using bias (β_0) because only the very large weights (β_i’s for i>0) corresponding to the features contribute to the overfitting. Bias term is just an intercept hence does not have much to do with the overfitting.

Phew…that was a lot about regularization. The common thing among the above methods was: they all have residuals/errors (ground truth-prediction) in their cost function. These errors are parallel to the y-axis. We could also consider errors along the x-axis and proceed similarly. See the plot below.

Diving Deeper into Linear Regression

y-errors and x-errors

What if we use a different kind of error?

Major axis (Orthogonal) regression

In this case, we consider errors in both directions (x-axis and y-axis). The sum of the square of perpendicular distances between the observed data points and the predicted line is to be minimized. Let’s visualize this by taking only one feature.

Diving Deeper into Linear Regression

(X_i, Y_i): the foot of the perpendicular drawn from (x_i, y_i) on the best fit line

Let our model be

Then the regression coefficients can be obtained by minimizing

Diving Deeper into Linear Regression

under the constraints

Reduced Major axis regression

This is very similar to the above method with a slight change. Here, we minimize the sum of areas of the rectangle formed by (X_i, Y_i) and (x_i, y_i).

Diving Deeper into Linear Regression

Reduced Major Axis

The total area extended by n data points is,

Diving Deeper into Linear Regression

The constraints here are the same as orthogonal regression.

When should you use orthogonal regression?

One should go for orthogonal and reduced major axis regressions when the uncerttainties are present in study (y) and explanatory (x) variables both.

One interesting thing in orthogonal regression is, it produces symmetrical fit w.r.t y-errors and x-errors. But in OLS we don’t get the symmetry for we minimize either y-errors or x-errors, not both.

Still curious? Watch a video that I made recently…

I hope you enjoyed the reading. Until next time…Happy learning!


以上就是本文的全部内容,希望本文的内容对大家的学习或者工作能带来一定的帮助,也希望大家多多支持 码农网

查看所有标签

猜你喜欢:

本站部分资源来源于网络,本站转载出于传递更多信息之目的,版权归原作者或者来源机构所有,如转载稿涉及版权问题,请联系我们

捉虫记

捉虫记

施迎 / 清华大学出版社 / 2010-6 / 56.00元

《捉虫记:大容量Web应用性能测试与LoadRunner实战》主要讲解大容量Web性能测试的特点和方法,以及使用业内应用非常广泛的工具——Load Runner 9进行性能测试的具体技术与技巧。《捉虫记:大容量Web应用性能测试与LoadRunner实战》共17章,分为5篇。第1篇介绍软件测试的定义、方法和过程等内容:第2篇介绍Web应用、Web性能测试的分类、基本硬件知识、Web应用服务器选型、......一起来看看 《捉虫记》 这本书的介绍吧!

JSON 在线解析
JSON 在线解析

在线 JSON 格式化工具

图片转BASE64编码
图片转BASE64编码

在线图片转Base64编码工具

html转js在线工具
html转js在线工具

html转js在线工具