How to calculate out of sample R squared?

Cross Validated Asked by crazydriver on January 13, 2021

I know this probably has been discussed somewhere else, but I have not been able to find an explicit answer. I am trying to use the formula $R^2 = 1 – SSR/SST$ to calculate out-of-sample $R^2$ of a linear regression model, where $SSR$ is the sum of squared residuals and $SST$ is the total sum of squares. For the training set, it is clear that

$$ SST = Sigma (y – bar{y}_{train})^2 $$

What about the testing set? Should I keep using $bar{y}_{train}$ for out of sample $y$, or use $bar{y}_{test}$ instead?

I found that if I use $bar{y}_{test}$, the resulting $R^2$ can be negative sometimes. This is consistent with the description of sklearn’s r2_score() function, where they used $bar{y}_{test}$ (which is also used by their linear_model’s score() function for testing samples). They state that “a constant model that always predicts the expected value of y, disregarding the input features, would get a R^2 score of 0.0.”

However, in other places people have used $bar{y}_{train}$ like here and here (the second answer by dmi3kno). So I was wondering which makes more sense? Any comment will be greatly appreciated!

2 Answers

First of all is need to say that for prediction evaluation, then out of sample, the usual $R^2$ is not adequate. It is so because the usual $R^2$ is computed on residuals, that are in sample quantities.

We can define: $R^2 = 1 – RSS/TSS$

RSS = residual sum of square

TSS = total sum of square

The main problem here is that residuals are not a good proxy for forecast errors because in residuals the same data would be used for both, model estimation and model prediction accuracy. If residuals (RSS) are used the prediction accuracy would be overstated; probably overfitting occur. Even TSS is not adequate as we see later. However we have to say that in the past the mistaken use of standard $R^2$ for forecast evaluation was quite common.

The out of sample $R^2$ ($R_{oos}^2$) maintain the idea of usual $R^2$ but in place of RSS is used the out of sample MSE of the model under analysis (MSE_m). In place of TSS is used the the out of sample MSE of one benchmark model (MSE_bmk).

$R_{oos}^2 = 1 – MSE_m/MSE_{bmk}$

One notable difference between $R^2$ and $R_{oos}^2$ is that

$0 leq R^2 leq 1$ (if the constant term is included)

while $-infty leq R_{oos}^2 leq 1$

If $R_{oos}^2 < = > 0$ the competing model perform worse/equal/better than the benchmark one. If $R_{oos}^2 =1$ the competing model predict perfectly the (new) data.

Here we have to keep in mind that the even for the benchmark model we have to consider the out of sample performance. Therefore the variance of the out of sample data underestimate $MSE_{bmk}$.

In my knowledge this measure was proposed for the first time in: Predicting excess stock returns out of sample: Can anything beat the historical average? - Campbell and Thompson (2008) - Review of Financial Studies. In it the the bmk forecast is based on the prevailing mean given information at time of the forecast.

Answered by markowitz on January 13, 2021

You are correct.

The OSR$^2$ residuals are based on testing data, but the baseline should still be training data. With that said, your SST is $SST=Σ(y−bar y_{train})^2$; notice that the is the same for $R^2$

Answered by user152317 on January 13, 2021

Add your own answers!

Related Questions

Linear model selection – Subset, Forward

0  Asked on February 16, 2021 by davud-mursalov


Churn prediction for customers with limited data

0  Asked on February 14, 2021 by ahmet-turul-bayrak


multiple linear regression vs polynomial regression models

0  Asked on February 14, 2021 by gracetam


Ask a Question

Get help from others!

© 2022 All rights reserved. Sites we Love: PCI Database, MenuIva, UKBizDB, Menu Kuliner, Sharing RPP, SolveDir