TransWikia.com

Omnibus and R square improvements for OLS model

Data Science Asked on February 8, 2021

Checking on this community if any one can help with this problem posted on Cross Validated.

Detailed question is as below:

                             OLS Regression Results                            
===============================================================================
Dep. Variable:     Losses in Thousands   R-squared:                       0.305
Model:                             OLS   Adj. R-squared:                  0.304
Method:                  Least Squares   F-statistic:                     1171.
Date:                 Fri, 20 Dec 2019   Prob (F-statistic):               0.00
Time:                         11:12:52   Log-Likelihood:                -72503.
No. Observations:                10703   AIC:                         1.450e+05
Df Residuals:                    10698   BIC:                         1.451e+05
Df Model:                            4                                         
Covariance Type:             nonrobust                                         
======================================================================================
                         coef    std err          t      P>|t|      [0.025      0.975]
--------------------------------------------------------------------------------------
const                539.6565      7.950     67.884      0.000     524.074     555.239
Age                   -6.1490      0.112    -54.971      0.000      -6.368      -5.930
Number of Vehicles    -1.7906      2.151     -0.832      0.405      -6.007       2.426
M                     97.2349      4.094     23.750      0.000      89.210     105.260
Single               136.7923      4.094     33.410      0.000     128.767     144.818
==============================================================================
Omnibus:                     7898.559   Durbin-Watson:                   2.010
Prob(Omnibus):                  0.000   Jarque-Bera (JB):           403312.043
Skew:                           3.029   Prob(JB):                         0.00
Kurtosis:                      32.456   Cond. No.                         187.
==============================================================================

Shown above are the results of an OLS model I ran in Python.

Below are my few understandings:

  • Omnibus : value close to Zero, to indicate normal distribution of
    error

  • Prob(Omnibus): Value must be close to 1 for normal error
    distribution

  • Skew : Same as above, close to zero

  • Condition Number – Indicates multicollinearity, so it must be relatively small number,something below 30. In below results, it is way above 30 but with
    correlation function, i couldn’t see any correlation(i found one but
    i dropped the variable so nothing left now)

    Results after logarithmic transformation of y variable.

OLS Regression Results

    Dep. Variable:     Losses in Thousands   R-squared:                       0.326
    Model:                             OLS   Adj. R-squared:                  0.326
    Method:                  Least Squares   F-statistic:                     1295.
    Date:                 Fri, 20 Dec 2019   Prob (F-statistic):               0.00
    Time:                         14:34:13   Log-Likelihood:                -9712.2
    No. Observations:                10703   AIC:                         1.943e+04
    Df Residuals:                    10698   BIC:                         1.947e+04
    Df Model:                            4                                         
    Covariance Type:             nonrobust                                         
    ======================================================================================
                             coef    std err          t      P>|t|      [0.025      0.975]
    --------------------------------------------------------------------------------------
    const                  6.3490      0.023    281.983      0.000       6.305       6.393
    Age                   -0.0203      0.000    -64.137      0.000      -0.021      -0.020
    Number of Vehicles     0.0007      0.006      0.118      0.906      -0.011       0.013
    M                      0.2137      0.012     18.429      0.000       0.191       0.236
    Single                 0.3159      0.012     27.240      0.000       0.293       0.339
    ==============================================================================
    Omnibus:                     1231.182   Durbin-Watson:                   1.998
    Prob(Omnibus):                  0.000   Jarque-Bera (JB):             1943.765
    Skew:                          -0.825   Prob(JB):                         0.00
    Kurtosis:                       4.279   Cond. No.                         187.
    =============================================================================

`
Correlation Matrix:

    Ac_No   Age Years of Experience Number of Vehicles  Losses in Thousands Losses in Thousands_log
Ac_No   1.000000    0.008291    0.008437    -0.003056   -0.000794   -0.001057
Age 0.008291    1.000000    0.997161    0.008366    -0.442962   -0.509823
Yr Exp  0.008437    0.997161    1.000000    0.008545    -0.442115   -0.511495
No Veh  -0.003056   0.008366    0.008545    1.000000    -0.011553   -0.004839
Loss    -0.000794   -0.442962   -0.442115   -0.011553   1.000000    0.849515
Loss_l  -0.001057   -0.509823   -0.511495   -0.004839   0.849515    1.000000

Describe():

Age Number of Vehicles  M   Single
count   10703.000000    10703.000000    10703.000000    10703.000000
mean    42.519761   2.497804    0.492292    0.490984
std 18.298802   0.951530    0.499964    0.499942
min 16.000000   1.000000    0.000000    0.000000
25% 24.000000   2.000000    0.000000    0.000000
50% 42.000000   2.000000    0.000000    0.000000
75% 61.000000   3.000000    1.000000    1.000000
max 70.000000   4.000000    1.000000    1.000000

R-Square is also very poor in this case (0.33) though there were slight improvement with log transformation(from 0.31 to 0.33).

To get a good model and to get the values of "Omnibus" and other parameters in limit, what other things I can do?

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP