How to choose initial theta in simple linear regression?

Question

I have the sales of items from January 2013 to October 2015. I just want to predict the total sales for the next month. Just for the sake of learning, I would like to transform it into a multiple regression model coded from scratch, without any libraries. So far, I’ve been able to get the betas but I don’t know how to get the prediction for the next month.

Here is the historical data for sales monthly from January 2013 to October 2015, ts:

date_block_num
0     131479.0
1     128090.0
2     147142.0
3     107190.0
4     106970.0
5     125381.0
6     116966.0
7     125291.0
8     133332.0
9     127541.0
10    130009.0
11    183342.0
12    116899.0
13    109687.0
14    115297.0
15     96556.0
16     97790.0
17     97429.0
18     91280.0
19    102721.0
20     99208.0
21    107422.0
22    117845.0
23    168755.0
24    110971.0
25     84198.0
26     82014.0
27     77827.0
28     72295.0
29     64114.0
30     63187.0
31     66079.0
32     72843.0
33     71056.0

I tried to do a simple linear regression:

$$y_t = alpha + beta x_t +varepsilon$$

I first tried to estimate $alpha$ and $beta$ and then use predict(alpha,beta,34). So I did:

import random

def predict(alpha, beta, x_i):
  return alpha+ beta * x_i

def error(alpha, beta, x_i, y_i):
  """the error from predicting beta * x_i + alpha
  when the actual value is y_i"""
  return y_i - predict(alpha, beta, x_i)

def sum_of_squarred_errors(alpha, beta, x, y):
  return sum(errors(alpha, beta, x_i, y_i)**2
             for x_i, y_i in zip(x,y))
  
def correlation(x,y):
  stdev_x = standard_deviation(x)
  stdev_y = standard_deviation(y)
  if stdev_x > 0 and stdev_y >0:
    return covariance(x,y)/ stdev_x/ stdev_y
  else:
    return 0
  
def least_squares_fit(x,y):
  """given training values for x and y
  find the least-squares error for alpha and beta"""
  beta = correlation(x,y) * standard_deviation(y)/ standard_deviation(x)
  alpha = mean(y) - beta * mean(x)
  return alpha, beta

def total_sum_squares(y):
  """the total squared variation of y_i's from their mean"""
  return sum(v ** 2 for v in de_mean(y))

def r_squared(alpha, beta, x, y):
  """the fraction of variation of y in captured by the model, which equals
  1 - the fraction of variation in y not catpured by the model"""
  return 1.0 - (sum_squared_errors(alpha, beta, x, y)/
                total_sum_of_squares(y))
  
  r_squared(alpha, beta, num_friends_good, daily_minutes_good)

def squared_error(x_i, y_i, theta):
  alpha, beta = theta
  return error(alpha, beta, x_i, y_i) ** 2

def squared_error_gradient(x_i, y_i, theta):
  alpha, beta = theta
  return [-2 * error(alpha, beta, x_i, y_i),
          -2 * error(alpha, beta, x_i, y_i) * x_i]

def in_random_order(data):
  """generator that returns the elements if data in random order"""
  indexes = [i for i, _ in enumerate(data)] # create a list of indexes
  random.shuffle(indexes) # suffle them
  for i in indexes:
    yield data[i]

def minimize_stochastic(target_fn, gradient_fn, x,y, theta_0, alpha_0=0.01):
  print("x: ", x, "ny: ",y.tolist())
  data = zip(x,y)
  theta = theta_0  #initial guess
  alpha = alpha_0  # initial step size
  min_theta, min_value = None, float('inf') # the minimum so far
  iterations_with_no_improvment = 0

  # if we ever go 100 iterations with no improvment, stop
  while iterations_with_no_improvment < 100:
    value = sum(target_fn(x_i, y_i, theta) for x_i, y_i in data)
    # print("value: ", value)

    if value < min_value:
      # if we've found a new minimum, remember it
      # and go back to the original step size
      min_theta, min_value = theta, value
      iterations_with_no_improvment = 0
      alpha = alpha_0
    else:
      # otherwise we're not improving, so try shrinking the step size
      iterations_with_no_improvment +=1
      alpha *=0.9

    # and take a gradient step for each of the data points
    # print("data: ", [x for x in data])
    # print("data: ", data)
    for x_i, y_i in in_random_order(data):
      gradient_i = gradient_fn(x_i, y_i, theta)
      theta = vector_substract(theta, scalar_multiply(alpha_gradient_i))
  return min_theta

# choose random value to start
random.seed(0)
theta = [random.random(), random.random()]

alpha, beta = minimize_stochastic(squared_error,
                                  squared_error_gradient, ts.index.values,
                                  ts.values,
                                  theta,
                                  0.001)

print("alpha: ", alpha, "beta: ", beta)

But got super low alphas and betas:

alpha:  0.8444218515250481 beta:  0.7579544029403025

So the total sales for 34 (November 2015) are: 26.614871551495334 which looks impossible compared to 33 (October 2015): 71056.0

So did I messed up with the linear regression algorithm? My guess is that my random values to start with are maybe too low:

theta = [random.random(), random.random()]

Yet, they should increase anyway until there is no input anymore, isn’t it?

So how to chose initial thetas for a simple linear regression?

implementation linear regression

How to choose initial theta in simple linear regression?

Add your own answers!

Ask a Question