Algorithms From Scratch: Linear Regression

栏目: IT技术 · 发布时间: 4年前

内容简介：Link to Github Repo…

Detailing and Building a Linear Regression model from scratch

Algorithms From Scratch: Linear Regression

Jul 8 ·9min read

Linear Regression is a popular linear Machine Learning algorithm for regression-based problems. It is usually one of the first algorithms that is learnt when first learning Machine Learning, due to its simplicity and how it builds into other algorithms like Logistic Regression and Neural Networks. In this story we are going to implement it from scratch so that we can build our intuition about what is going on within a Linear Regression model.

Link to Github Repo…

kurtispykes/ml-from-scratch

Permalink Dismiss GitHub is home to over 50 million developers working together to host and review code, manage…

github.com

Note: There are many frameworks with highly optimized code such as Scikit-Learn, Tensorflow and PyTorch, therefore it is usually unnecessary to build your own algorithm from scratch. However, it serves a good purpose to our intuition of what is happening within the model when we build it from scratch and this helps when we attempt to improve our models performance.

A linear model is an algorithm that makes a prediction by simply computing a weighted sum of the input features plus a bias term (also referred to as the intercept term). Taking that into perspective, what we are doing when we use a linear regression model is we hope to explain the relationship between a dependent variable (i.e house price) and one or more independent variables (i.e. location, bedrooms, area, etc).

Figure 1: Multivariate Linear Regression

When we train a model we are attempting to set the parameters to get a line that best fits the training data. Therefore, when we train a Linear Regression model we are trying to find the value of theta that best minimizes the cost function . The most common cost function of a regression models is the RMSE , however, it is much easier to minimize the MSE as it leads to the same result.

Creating the Model

If you have never written a Machine Learning algorithm from scratch, I greatly encourage you to do so. John Sullivan wrote a very useful story title 6 Steps To Write Any Machine Learning Algorithm From Scratch: Perceptron Case Study w hich is the best advice I have managed to find on the internet about writing algorithms from scratch.

Chunking the Algorithm

Randomly initialize parameters for the hypothesis function
Calculate the Partial Derivative (Read more about this here )
Update parameters
Repeat 2–3 for n number of iterations (Until cost function is minimized otherwise)
Inference

Implementation

For this section, I will be leveraging 3 Python packages. NumPy for Linear algebra, Scikit-Learn which is a popular Machine Learning framework and Matplotlib to visualize our data.

import numpy as np 
import matplotlib.pyplot as plt 
from sklearn.datasets import make_regression 
from sklearn.linear_model import LinearRegression 
from sklearn.model_selection import train_test_split 
from sklearn.metrics import mean_squared_error

First, we need a dataset. To do this I will sklearn.datasets.make_regression which allows you to generate a random regression problem — see Documentation . Next, I will split my data into training and test sets with sklearn.model_selection.train_test_split — Documentation .

# creating the data set
X, y = make_regression(n_samples=100, n_features=1, n_targets=1, noise=20, random_state=24)

# splitting training and test
X_train, X_test, y_train, y_test = train_test_split(X, y, train_size=0.75, random_state=24)

Lets use matplotlib.pyplot to see how our data looks — Documentation .

# visualize 
plt.scatter(X, y)
plt.show()

Now we can begin our implementation of Linear Regression. The first step of our chunk is to randomly initialize parameters for our hypothesis function.

def param_init(X): 
    """
    Initialize parameters for linear regression model
    __________________ 
    Input(s)
    X: Training data
    __________________
    Output(s)
    params: Dictionary containing coefficients
    """
    params = {} # initialize dictionary 
    _, n_features = X.shape # shape of training data

    # initializing coefficents to 0 
    params["W"] = np.zeros(n_features)
    params["b"] = 0
    return params

Excellent. Next we want to calculate the partial derivatives and update our parameters. We do this using a very important Machine Learning algorithm called Gradient Descent . Ergo, we can implement steps 2–4 with Gradient Descent.

def gradient_descent(X, y, params, alpha, n_iter): 
    """
    Gradient descent to minimize cost function
    __________________ 
    Input(s)
    X: Training data
    y: Labels
    params: Dictionary contatining random coefficients
    alpha: Model learning rate
    __________________
    Output(s)
    params: Dictionary containing optimized coefficients
    """
    W = params["W"] 
    b = params["b"]
    m = X.shape[0] # number of training instances 

    for _ in range(n_iter): 
        # prediction with random weights
        y_pred = np.dot(X, W) + b
        # taking the partial derivative of coefficients
        dW = (2/m) * np.dot(X.T, (y_pred - y)) 
        db = (2/m) * np.sum(y_pred -  y)
        # updates to coefficients
        W -= alpha * dW
        b -= alpha * db 
    
    params["W"] = W
    params["b"] = b
    return params

Using these functions we can train our Linear Regression model on our training data to obtain model parameters that we will require for inference.

def train(X, y, alpha=0.01, n_iter=1000):
    """
    Train Linear Regression model with Gradient decent
    __________________ 
    Input(s)
    X: Training data
    y: Labels
    alpha: Model learning rate
    n_iter: Number of iterations 
    __________________
    Output(s)
    params: Dictionary containing optimized coefficients
    """ 
    init_params = param_init(X)
    params = gradient_descent(X, y, init_params, alpha, n_iter)
    return params

Now, when we run this function we would obtain the optimized weights from our training data that we will use for inference on our test data. Next, we need to create a predict function for our inference that uses our stored weights.

def predict(X_test, params):
    """
    Train Linear Regression model with Gradient decent
    __________________ 
    Input(s)
    X: Unseen data
    params: Dictionary contianing optimized weights from training
    __________________
    Output(s)
    y_preds: Predictions of model
    """  
    y_preds = np.dot(X_test, params["W"]) + params["b"]
    return y_preds

Great! Let’s run these functions and plot them to see what happens…

params = train(X_train, y_train) # train model
y_preds = predict(X_test, params) # inferenceplt.scatter(X_test, y_test)
plt.plot(X_test, y_preds, color="red")
plt.title("Predictions Dummy Regression Data")
plt.xlabel("X axis")
plt.ylabel("Y axis")

plt.show()

Our line of best fit seems to be pretty decent. To be entirely sure of our implementation we are fortunate enough to have many Machine Learning libraries, with optimized code, to compare our implementation to. To compare, I am simply going to check the RMSE of our implementation vs Scikit-learn.

Note: It may also be worthwhile to plot their implementation on the same plot and if their line of best fit covers yours then you are on the right tracks.

lin_reg = LinearRegression()
lin_reg.fit(X_train, y_train)
sklearn_y_preds = lin_reg.predict(X_test)print(f"My implementation: {np.sqrt(mean_squared_error(y_test, y_preds))}\nSklearn implementation: {np.sqrt(mean_squared_error(y_test, y_preds))}")>>>> My implementation: 20.986105292320207
Sklearn implementation: 20.986105292320207

It’s a match!

For this implementation we used procedural programming which can become quite complex and cumbersome to maintain and does not maximize the potential of Python which is an Object oriented Programming (OOP) language. On that note, here is an OOP implementation of our Linear Regression model.

class LinReg(): 
    """
    Custom made Linear Regression class
    """
    def __init__(self, alpha=0.01, n_iter= 1000): 
        self.alpha = alpha
        self.n_iter = n_iter 
        self.params = {}
    
    def param_init(self, X_train): 
        """
        Initialize parameters for linear regression model
        __________________ 
        Input(s)
        X: Training data
        """
        _, n_features = self.X.shape # shape of training data

        # initializing coefficents to 0 
        self.params["W"] = np.zeros(n_features)
        self.params["b"] = 0
        return self
        

    def gradient_descent(self, X_train, y_train): 
        """
        Gradient descent to minimize cost function
        __________________ 
        Input(s)
        X: Training data
        y: Labels
        params: Dictionary contatining random coefficients
        alpha: Model learning rate
        __________________
        Output(s)
        params: Dictionary containing optimized coefficients
        """
        W = self.params["W"] 
        b = self.params["b"] 
        m = X_train.shape[0]

        for _ in range(self.n_iter): 
            # prediction with random weights
            y_pred = np.dot(X_train, W) + b
            # taking the partial derivative of coefficients
            dW = (2/m) * np.dot(X_train.T, (y_pred - y_train)) 
            db = (2/m) * np.sum(y_pred -  y_train)
            # updates to coefficients
            W -= self.alpha * dW
            b -= self.alpha * db 
        
        self.params["W"] = W
        self.params["b"] = b
        return self

    def train(self, X_train, y_train):
        """
        Train Linear Regression model with Gradient decent
        __________________ 
        Input(s)
        X: Training data
        y: Labels
        alpha: Model learning rate
        n_iter: Number of iterations 
        __________________
        Output(s)
        params: Dictionary containing optimized coefficients
        """ 
        self.params = param_init(X_train)
        gradient_descent(X_train, y_train, self.params , self.alpha, self.n_iter)
        return self 

    def predict(self, X_test):
        """
        Train Linear Regression model with Gradient decent
        __________________ 
        Input(s)
        X: Unseen data
        params: Dictionary contianing optimized weights from training
        __________________
        Output(s)
        y_preds: Predictions of model
        """  
        y_preds = np.dot(X_test, self.params["W"]) + self.params["b"]
        return y_preds

Let’s call this class and make it make a prediction on our test data…

linreg = LinReg()
linreg.train(X_train, y_train)
linreg.predict(X_test)>>>> 
array([   4.73888182,  -90.06369632,   80.39799712,   66.76983607,
        -49.97207144,   93.77905208,   34.30778991,  -38.2209702 ,
         78.03331698,   53.81416352,  102.96993005,  151.71946744,
         95.52801857,  104.82707085,   98.0492089 ,   45.05150211,
         -7.29917923,  -78.41675446,  -27.14118529,  -98.52923336,
        170.75840972, -106.22126739,   24.86194847,  -21.39127805,
         50.24074837])

As a sanity check we will test if the predictions are the same as our procedural implementation (since we know this is similar to the scikit-learn implementation already).

linreg.predict(X_test) == y_preds>>>> 
array([ True,  True,  True,  True,  True,  True,  True,  True,  True,
        True,  True,  True,  True,  True,  True,  True,  True,  True,
        True,  True,  True,  True,  True,  True,  True])

And there you have it!

Note: The problem we used as an example is simple linear regression problem (Univariate Linear Regression), y = WX + b, where W is the weight and b is the bias.

Assumptions

Before deciding that we want to use Linear Regression it is important to understand the assumptions that the model makes of our data so that we can perform the necessary feature engineering to align with our model.

For more on feature engineering, see my previous story on this topic…

Feature Engineering for Numerical Data

Tips ’n’ Tricks for engineering numerical values

towardsdatascience.com

Linear Relationship— Linear regression only captures linear relationships, ergo assumes that there is a linear relationship between the features and the target that we want to predict. We can check the validity of this by plotting scatter graphs of the features in relation to the target variable.

Multicollinearity — Refers to a situation in which two or more explanatory variables in a multiple regression model are highly related (Source: Wikipedia). When we implement Linear Regression we are assuming that there is little to no multicollinearity since when present it weakens the statistical power of the regression model. To identify whether multicollinearity is present in our data we can use a correlation matrix to see which features are highly correlated and remove one of the highly correlated features.

Homoscedasticity — In statistics, a sequence (or a vector) of random variables is homoscedastic if all its random variables have the same finite variance (Source: Wikipedia). Linear Regression assumes that the noise is the same (may also be referred to as error term or random disturbance) across all observations and does not depend on the values of the independent variables. We may use a scatter plot of the residual values against the predicted values to check of homoscedasticity. We should expect no patterns in the distribution because if there is a pattern then the data is heteroscedastic.

Multivariate Normality— We assume the variables we use for Linear Regression follow the Normal Distribution . We can check this using a Histogram or a QQ-plot and use a non-linear transformation (i.e. log-transformation) to change the distribution of the data to fit our assumption.

Autocorrelation —The correlation of a signal with a delayed copy of itself as a function of delay — Informally, it is the similarity between observations as a function of the time lag between them (Source: Wikipedia). To put it simply Autocorrelation occurs when the residuals are not independent from each other. We may use a scatter plot to visually check for autocorrelation or the Durbin-Watsons’s test which test the null hypothesis that the residuals are not linearly autocorrelated.

Pros

Easy to implement
Easy to interpret the output coefficients
Less complex

Cons

Sensitive to Outliers
Assumes linear relationship between independent and dependent variables
Assumes independence between features

For more about Linear Regression you can read the Wikipedia page.

Linear regression

In statistics, linear regression is a linear approach to modeling the relationship between a scalar response (or…

en.wikipedia.org

Wrap Up

Linear Regression is one of the easiest Machine Learning models and most likely the first you would learn or should learn. It is a very useful tool to analyze the relationship between variables, but due to the many assumptions of the model and simplification of real-world problems, it is not recommended for most practical applications.

Thank you for taking the time to go through this story. If there is anything that I have missed or you’d like me to clarify, leave a response in the comments. Additionally, If you’d like to get in contact with me, I am most accessible on LinkedIn.

Kurtis Pykes - AI Writer - Towards Data Science | LinkedIn

View Kurtis Pykes' profile on LinkedIn, the world's largest professional community. Kurtis has 1 job listed on their…

www.linkedin.com

以上就是本文的全部内容，希望对大家的学习有所帮助，也希望大家多多支持码农网

查看所有标签

猜你喜欢:

Algorithms From Scratch: Linear Regression

本站部分资源来源于网络，本站转载出于传递更多信息之目的，版权归原作者或者来源机构所有，如转载稿涉及版权问题，请联系我们。

码农书籍

机器消灭秘密

安迪•格林伯格 (Andy Greenberg) / 王崧、王涛、唐禾 / 重庆出版社 / 2017-8-10 / 49.8

《机器消灭秘密》一书中，格林伯格深入研究并生动再现了那些拥有全能技术的网络安全魔术师，他们将任何企图染指个人隐私的所谓国家机密的保密性打得粉碎。这本精心组织的著作是对此题材感兴趣的读者的必读之书，即便现在你可能不感兴趣，将来也极有可能希望了解这些内容，因为任何人都会不可避免地置身其中。无论你是初涉电脑屏幕之后的虚拟战场的新生，还是经验丰富的维基解密观察家，本书都是不可多得的上乘之作，你总会在其中发......一起来看看《机器消灭秘密》这本书的介绍吧!

码农工具