TransWikia.com

Linear Regression in Python using gradient descent

Data Science Asked on November 13, 2021

I am trying to implement a simple multivariate linear regression model without using any inbuilt machine libraries. So far, I have been able to get a root mean squared error for training about $2.93$ and the model from the normal (closed-form) equation is able to produce a training RMSE of $~2.3$. I am looking for ways in which I can improve my implementation of the gradient descent algorithm. Below is my implementation:

My gradient descent method looks like this:
$theta = theta – [(alpha/2N) * X (Xtheta – Y)]$ where $theta$ is the model parameter, $N$ is the number of training elements, $X$ is the input and $Y$ are the target elements. $alpha$ is the step size.

def gradientDescent(self):
    for i in range(self.iters):
        # T = T - (alpha/2N) * X*(XT - Y)
        self.theta = self.theta - (self.alpha/len(self.X)) * np.sum(self.X * (self.X @ self.theta.T - self.Y), axis=0)
    return errors

I had set the $alpha$ as $0.1$ and number of iterations as 1000. The gradient descent reaches convergence at around 700-800 iterations (checked).

My error function is like:

def error_function(self):
        # Error function: (1/2N) * (XT - Y)^2 where T is theta
        error_values = np.power(((self.X @ self.theta.T) - self.Y), 2)
        return np.sum(error_values)/(2 * len(self.X))

I was expecting the training error from the gradient descent and the normal equations would turn out to be similar, but they have a bit of a huge difference. So, I wanted to know whether I am doing anything wrong or not.

PS I have not normalized the data, yet. Normalizing leads to a much lower RMSE (~$0.22$)

One Answer

That could be due to many different reasons. The most important one is that your cost function might be stuck in local minima. To solve this issue, you can use a different learning rate or change your initialization for the coefficients.

There might be a problem in your code for updating weights or calculating the gradient.

However, I used both methods for a simple linear regression and got the same results as follows:

import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.datasets import make_regression


# generate regression dataset
X, y = make_regression(n_samples=100, n_features=1, noise=30)



def cost_MSE(y_true, y_pred):
    '''
    Cost function
    '''
    # Shape of the dataset
    n = y_true.shape[0]
    
    # Error 
    error = y_true - y_pred
    # Cost
    mse = np.dot(error, error) / n
    return mse


def cost_derivative(X, y_true, y_pred):
    '''
    Compute the derivative of the loss function
    '''
    # Shape of the dataset
    n = y_true.shape[0]
    
    # Error 
    error = y_true - y_pred
    
    # Derivative
    der = -2 / n * np.dot(X, error)

    return der


# Lets run an example

X_new = np.concatenate((np.ones(X.shape), X), axis = 1)
learning_rate = 0.1
X_new_T = X_new.T
n_iters = 100
mse = []

#initialize the weight vector
alpha = np.array([0, np.random.rand()])

for _ in range(n_iters):
    
    # Compute the predicted y
    y_pred = np.dot(X_new, alpha)
    
    # Compute the MSE
    mse.append(cost_MSE(y, y_pred))
    
    # Compute the derivative
    der = cost_derivative(X_new_T, y, y_pred)
    
    # Update the weight
    alpha  -= learning_rate * der
alpha

for the gradient descent the coefficients were:

array([-3.36575322, 28.06370831])

Here is the code for closed-form solution:

np.dot(np.linalg.inv(np.dot(X_new_T,X_new)), np.dot(X_new_T, y))

And the coefficients for the closed-form solution:

array([-3.36575322, 28.06370831])

As the coefficients are equal, the RMSE, MSE, R2 are equal.

Answered by nimar on November 13, 2021

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP