# Error propagation in combined linear models

Cross Validated Asked by Chochot on January 5, 2022

I have a set of observed values ($y_{1Obs}$) and 3 predictive variables (n = 27). I use multiple linear regression to create a linear model:

$Z_1=alpha_0 + alpha_1 W_1 + alpha_2 X_1 + alpha_3 Y_1$

Where $Z_1$ is my response variable, $alpha_0$ is the intercept, $alpha_1 , alpha_2 , alpha_3$ are regression coefficients and $W_1, X_1, Y_1$ are predictive variables.

I also have a second set of observed values ($y_{2Obs}$) and 3 different predictive variables (n = 27). I again use multiple linear regression to create a second linear model of the same form:

$Z_2=beta_0 + beta_1 W_2 + beta_2 X_2 + beta_3 Y_2$

Where $Z_2$ is my response variable, $beta_0$ is the intercept, $beta_1 , beta_2 , beta_3$ are regression coefficients and $W_2, X_2, Y_2$ are predictive variables.

With 27 predicted values from each model I then calculate the predicted change between the two:

$Delta Z_{Pred}=Z_2 – Z_1$

While my observed change comes from the two sets of observed values used to create the two linear models:

$Delta Z_{Obs}=y_{2Obs} – y_{1Obs}$

My question is: what formula do I use to calculate the RMSE of my predictions ($Delta Z_{Pred}$) that will propagate the errors from predictions of $Z_1$ and $Z_2$?

I’ve tried using the following formula though I’m not convinced this is correct:

$RMSE_{Delta Z_{Pred}} = sqrt{{RMSE_1}^2 + {RMSE_2}^2}$

Where $RMSE_1$ and $RMSE_2$ are the RMSEs from the first two models shown above.

• The formula $RMSE_{Delta Z_{Pred}} = sqrt{{RMSE_1}^2 + {RMSE_2}^2}$ may work for independent sets of responses and predictors, but is likely to be false if the two sets are correlated. For example, if your two sets of predictors and responses are the same $RMSE_{Delta Z_{Pred}}=0$, while $sqrt{{RMSE_1}^2 + {RMSE_2}^2} > 0$.
• If you are interested in $Delta Z_{Pred}$, I suggest regressing it on the six predictors - although this is not what you are asking, it might server your goal.

Answered by Pere on January 5, 2022

## Related Questions

### Time series tracking queue optimization problem

1  Asked on January 14, 2021 by doxav

### Sample log geometric distribution from log probability

1  Asked on January 14, 2021

### what is the likelihood function $p(y|a,tau)$ of simple linear regression model?

1  Asked on January 14, 2021 by user261225

### Forecasting with mixed models

1  Asked on January 13, 2021 by katy

### Why do some researchers use the oxymoron “prevalence rate”?

0  Asked on January 13, 2021

### How to calculate out of sample R squared?

2  Asked on January 13, 2021 by crazydriver

### Denoising 3D matrix

0  Asked on January 13, 2021 by haohan-wang

### In this Bayesian network, where does this posterior probability come from?

1  Asked on January 13, 2021 by vin

### What is wrong with my approach on a custom way of creating Gabor-filter convolution kernels?

0  Asked on January 12, 2021 by g-s-luimstra

### Pseudo-inverse matrix for multivariate linear regression

1  Asked on January 12, 2021 by somethingsomething

### Assessing the representativeness of population sampling

1  Asked on January 12, 2021 by user3136

### How can an A/B test show significant result without enough data

0  Asked on January 11, 2021 by jonas-palaionis

### Cross-lagged model and supplement regressions: Do I have to include my control variables in the supplement regression analyses?

0  Asked on January 11, 2021 by sventon

### Is it Valid to Grid Search Cross Validation for Model Hyperparameter Selection then a separate Cross Validation for Generalisation Error?

2  Asked on January 11, 2021 by benjamin-phua

### Find $E[N^2 | N > 2]$ for a frequency distribution

1  Asked on January 10, 2021 by confusedmathstudent

### Finding meaningful boundaries between two continuous variables in R

0  Asked on January 10, 2021

### Using categorical feature as both a continuous feature, and also doing One hot encoding. Is this overkill?

2  Asked on January 10, 2021 by stats_nerd