# How to calculate out of sample R squared?

Cross Validated Asked by crazydriver on January 13, 2021

I know this probably has been discussed somewhere else, but I have not been able to find an explicit answer. I am trying to use the formula $R^2 = 1 – SSR/SST$ to calculate out-of-sample $R^2$ of a linear regression model, where $SSR$ is the sum of squared residuals and $SST$ is the total sum of squares. For the training set, it is clear that

$$SST = Sigma (y – bar{y}_{train})^2$$

What about the testing set? Should I keep using $bar{y}_{train}$ for out of sample $y$, or use $bar{y}_{test}$ instead?

I found that if I use $bar{y}_{test}$, the resulting $R^2$ can be negative sometimes. This is consistent with the description of sklearn’s r2_score() function, where they used $bar{y}_{test}$ (which is also used by their linear_model’s score() function for testing samples). They state that “a constant model that always predicts the expected value of y, disregarding the input features, would get a R^2 score of 0.0.”

However, in other places people have used $bar{y}_{train}$ like here and here (the second answer by dmi3kno). So I was wondering which makes more sense? Any comment will be greatly appreciated!

First of all is need to say that for prediction evaluation, then out of sample, the usual $$R^2$$ is not adequate. It is so because the usual $$R^2$$ is computed on residuals, that are in sample quantities.

We can define: $$R^2 = 1 – RSS/TSS$$

RSS = residual sum of square

TSS = total sum of square

The main problem here is that residuals are not a good proxy for forecast errors because in residuals the same data would be used for both, model estimation and model prediction accuracy. If residuals (RSS) are used the prediction accuracy would be overstated; probably overfitting occur. Even TSS is not adequate as we see later. However we have to say that in the past the mistaken use of standard $$R^2$$ for forecast evaluation was quite common.

The out of sample $$R^2$$ ($$R_{oos}^2$$) maintain the idea of usual $$R^2$$ but in place of RSS is used the out of sample MSE of the model under analysis (MSE_m). In place of TSS is used the the out of sample MSE of one benchmark model (MSE_bmk).

$$R_{oos}^2 = 1 – MSE_m/MSE_{bmk}$$

One notable difference between $$R^2$$ and $$R_{oos}^2$$ is that

$$0 leq R^2 leq 1$$ (if the constant term is included)

while $$-infty leq R_{oos}^2 leq 1$$

If $$R_{oos}^2 < = > 0$$ the competing model perform worse/equal/better than the benchmark one. If $$R_{oos}^2 =1$$ the competing model predict perfectly the (new) data.

Here we have to keep in mind that the even for the benchmark model we have to consider the out of sample performance. Therefore the variance of the out of sample data underestimate $$MSE_{bmk}$$.

In my knowledge this measure was proposed for the first time in: Predicting excess stock returns out of sample: Can anything beat the historical average? - Campbell and Thompson (2008) - Review of Financial Studies. In it the the bmk forecast is based on the prevailing mean given information at time of the forecast.

Answered by markowitz on January 13, 2021

You are correct.

The OSR$^2$ residuals are based on testing data, but the baseline should still be training data. With that said, your SST is $SST=Σ(y−bar y_{train})^2$; notice that the is the same for $R^2$

Answered by user152317 on January 13, 2021

## Related Questions

### Compare the results of two canonical correlation analyses (CCA)

1  Asked on February 18, 2021 by forlooper

### Why doesn’t the optimizer just look for stationary points of the loss function?

1  Asked on February 18, 2021 by borut-flis

### Statistical analysis for comparing expertise levels between 3 groups

0  Asked on February 17, 2021 by kaaren0111

### Find the prior distribution for the natural parameter of an exponential family

1  Asked on February 17, 2021 by xxtensionxx

### Using decision tree for unsupervised discretization?

1  Asked on February 17, 2021 by aflatoun

### How to conduct a multilevel model/regression for panel data in Python?

1  Asked on February 17, 2021 by exlo

### How to choose resample size when drawing without replacement?

1  Asked on February 17, 2021

### Two-way repeated measures ANOVA vs mixed ANOVA

1  Asked on February 16, 2021 by xe-m

### What is the adjusted R-squared formula in lm in R and how should it be interpreted?

2  Asked on February 16, 2021 by user1272262

### Linear model selection – Subset, Forward

0  Asked on February 16, 2021 by davud-mursalov

### What model to use for analyzing age frequency data? issues with linear model in R

1  Asked on February 16, 2021 by johnny5ish

### Dimensionality reduction of a large covariance matrix

0  Asked on February 15, 2021 by wilmer-e-henao

### How to estimate variance of classifier on test set?

2  Asked on February 15, 2021 by pterojacktyl

### Churn prediction for customers with limited data

0  Asked on February 14, 2021 by ahmet-turul-bayrak

### multiple linear regression vs polynomial regression models

0  Asked on February 14, 2021 by gracetam

### Multiple Regression study design – questionnaires

1  Asked on February 14, 2021

### Which package works for mediation analysis in R when variables are categorical?

0  Asked on February 14, 2021 by fouzia-farooq

### Compare two samples with many zeros

4  Asked on February 14, 2021

### Non-statistically significant effect of the instrument in the reduced form of the 2SLS

1  Asked on February 14, 2021 by fuca26

### What are the different types of averages?

4  Asked on February 14, 2021 by gpuguy

### Ask a Question

Get help from others!