Change Score or Regressor Variable Method - Should I regress $Y_1$ over $X$ and $Y_0$ or $(Y_1-Y_0)$ over $X$

Question

I have data about investment preferences 1 year before the Covid and during the Covid lockdown.
Some changes appear using simple T-Test. I want to be able to assess if these changes are particularly strong for some specific demographics (e.g., older individuals ($X_1$), individuals with lower income ($X_2$), etc...).
Should I use the initial level of my dependant variable in the regressions? Basically, if I want to use OLS regressions to investigate which independant variable correlate with the change in my dependant variable, which model is preferrable?
Model 1 (apparently called Change Score Method):
$(Y_2-Y_1)= beta_1 . X_1+ beta_2 . X_2 $
Model 2 (apparently called Regressor Variable Method) Score Method):
$Y_2= beta_1 . X_1+ beta_2 . X_2 + beta_3 . Y_1 $
Thank you so much for your help - Any reference would also be much appreciated!

rnso · Answer

Both methods have been used. See here for example. It depends what question you want to answer. If you want to talk mostly about "change" you can use
(Y2-Y1) ~ X1 + X2            # (1)

Basal (Y1) should not be added to above equation as it will always be correlated with difference (Y2-Y1) - see comments below by @EdM and here.
On the other hand, if you want to discuss factors affecting "final value", you can use
Y2 ~ X1 + X2 + Y1            # (2)

However, since repeated measurements (Y1,Y2 at 2 times) have been done on same subject, hence mixed model is also often used. (including interactions as commented by @dbwilson below):
Y ~ X1 + X2 + time + X1*time + X2*time + (1|subject)

Following simplified version of formula is effectively same as above:
Y ~ X1*time + X2*time + (1|subject)            # (3)

There is another method commonly used, especially in biomedical literature: "Percent change", i.e.
(100*(Y2-Y1)/Y1) ~ X1 + X2            # (4)

It is not correct to keep Y1 as a predictor variable in this last method as there will be strong correlation between baseline and percent change.
I think this last method (percent change) is most understandable.
See here for more information on this topic.

Change Score or Regressor Variable Method - Should I regress $Y_1$ over $X$ and $Y_0$ or $(Y_1-Y_0)$ over $X$

One Answer

Add your own answers!

Ask a Question