TransWikia.com

Linear Regression Model Validation with Transformed Data

Data Science Asked by Taku Charles-Noel Endo on March 8, 2021

I worked on a model that I applied a log10 transformation to the dependent variable. I am having trouble with manually calculating the R2 for both train and test dataset. The model looks like this.

Model <- lm(log10(Total_LT) ~ ThreeComb + Ship_Qtr, data = Train_Data)

Additionally here is the summary of the model.

Residuals:
 Min       1Q   Median       3Q      Max 
-0.47904 -0.09681 -0.00449  0.09272  0.63265 

Coefficients:
                                Estimate Std. Error t value Pr(>|t|)    
(Intercept)                         1.178008   0.007786 151.302  < 2e-16 ***

ThreeCombAIR Site A Product C       0.221098   0.042209   5.238 1.85e-07 ***

ThreeCombAIR Site B Product B       0.467222   0.050400   9.270  < 2e-16 ***

ThreeCombAIR Site C Product B      -0.020639   0.013471  -1.532 0.125716    

ThreeCombFASTBOAT Site A Product A  0.357324   0.015775  22.652  < 2e-16 ***

ThreeCombFASTBOAT Site A Product C  0.397101   0.015291  25.970  < 2e-16 ***

ThreeCombGROUND Site D Product B   -0.084635   0.010842  -7.806 1.08e-14 ***

ThreeCombOCEAN Site A Product A     0.470911   0.014879  31.648  < 2e-16 ***

ThreeCombOCEAN Site A Product B     0.582689   0.025467  22.880  < 2e-16 ***

ThreeCombOCEAN Site A Product C     0.474703   0.061184   7.759 1.56e-14 ***

ThreeCombOCEAN Site B Product B     0.414655   0.016140  25.691  < 2e-16 ***

Ship_QtrQ2                         -0.039806   0.009264  -4.297 1.84e-05 ***

Ship_QtrQ4                         -0.040277   0.012147  -3.316 0.000935 ***

---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.1489 on 1535 degrees of freedom

Multiple R-squared:  0.6803,    Adjusted R-squared:  0.6778 

F-statistic: 272.2 on 12 and 1535 DF,  p-value: < 2.2e-16

Now I am trying to test this model my calculating Rsquared manually like this.

Train_Data$Residual <- Model$residuals
Test_R2 <- 1 - (sum((Test_Data$Residual)^2)/ sum((Test_Data$Total_LT - mean(Test_Data$Total_LT))^2))

Here is the output that I get when for my R2

[1] 0.9999015

To validate my model, I also did this to calculate my R2 for test dataset.

Test_Data$Predicted <- predict(Model, newdata = Test_Data) 
Test_Data$Residual <- Test_Data$Total_LT - Test_Data$Predicted
Test_R2 <- 1 - (sum((Test_Data$Residual)^2)/ sum((Test_Data$Total_LT - mean(Test_Data$Total_LT))^2))

And I get this R2 for test dataset.

[1] -1.964802

I am thinking this was caused by log10 transformation that I applied to my model. What can I do to make my R2 for both training and test close to 0.68 like it actually says on the summary of the model?

By the way, I tried the same thing without log10 transformation, and got a very good R2.

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP