Difference between Repeated measures ANOVA, ANCOVA and Linear mixed effects model

Question

What is the best way to analyze these data:
Subjects are divided in two "Group" (Treatment A and B).
"Weight" is recorded before and 3 months after treatment.
Outcome variable: Percent reduction in weight
Main question is: whether there is any difference between 2 treatments in terms of percent reduction in weight?
Which of following is appropriate for this (or will they give same result)?

Repeated measures ANOVA (with "Weight" as outcome, ["Group", "Time"] as within-factors and adjusting for "subject"). But can we use "Percent reduction in weight" here?

ANCOVA (with "Percent reduction in weight" as outcome, "Group" as between-factor and "baseline weight" as a covariate)

Linear mixed effects method with "Weight" as outcome, [group, time, group*time] as fixed effects and [subject] as random effect. Again, can we use "Percent reduction in weight" here?

Linear model with interaction: "Percent reduction in weight" ~ "Group" * "Baseline weight"

Edit: As asked in comments, added information is about N. There are 100 subjects in each group included using randomization.

Robert Long · Answer

First there is the question of whether it is OK to use percent change as the outcome. In a regression model with baseline as a regressor this is a very bad idea because the outcome is mathematically coupled to the regressor which will induce correlation (ie statistically significant associations) where none is actually present (or mask actual change). This is easy to show with a simulation:

We simulate 2 goups of 100 each where in the first instance there is no change from baseline in either group:

set.seed(15)
N <- 200
x1 <- rnorm(N, 50, 10)
trt <- c(rep(0, N/2), rep(1, N/2))  # allocate to 2 groups
x2 <- rnorm(N, 50, 10)   # no change from baseline

So we expect to find nothing of any interest:

summary(lm(x2 ~ x1 * trt))

Coefficients:
Estimate Std. Error t value Pr(>|t|)    
(Intercept) 45.75024    5.37505   8.512 4.43e-15 ***
x1           0.06776    0.10342   0.655    0.513    
trt          3.25135    7.12887   0.456    0.649    
x1:trt      -0.01689    0.13942  -0.121    0.904

as expected. But now we create a percent change variable and use it as the outcome:

pct.change <- 100*(x2 - x1)/x1
summary(lm(pct.change ~ x1 * trt))

Coefficients:
Estimate Std. Error t value Pr(>|t|)    
(Intercept)  97.5339    12.7814   7.631 9.93e-13 ***
x1           -1.9096     0.2459  -7.765 4.44e-13 ***
trt          45.1394    16.9519   2.663  0.00839 ** 
x1:trt       -0.7662     0.3315  -2.311  0.02188 *

Everything is significant ! So we would interpet this as: the expected percent change in weight for a subject in the control group with zero baseline weight is 97; the expected change in the percent change in weight for a subject in the control group for each additional unit of baseline weight is -1.91; the expected difference in the percent change in weight between the control and treatment group for a subject with zero baseline weight is 45; and the expected difference in the percent change in weight between the treatment and control groups for each additional unit of baseline weight is -0.77.... All completely suprious !!!! Note also that with a "percent change" variable, then we have to use language like "expected change in the percent change" which does not help with understanding.

Now let's introduce an actual treatment effect of 10,

x3 <- x1 + rnorm(N, 0, 1) + trt*10  
summary(lm(x3 ~ x1 * trt))

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept) -0.95933    0.54404  -1.763   0.0794 .  
x1           1.01921    0.01047  97.365   <2e-16 ***
trt         10.78643    0.72156  14.949   <2e-16 ***
x1:trt      -0.01126    0.01411  -0.798   0.4260

...all good.

Now again, we create a percent change variable and use it as the outcome:

pct.change.trt <- 100*(x3 - x1)/x1
summary(lm(pct.change.trt ~ x1 * trt))

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept) -1.77928    1.23337  -1.443    0.151    
x1           0.03439    0.02373   1.449    0.149    
trt         49.11734    1.63580  30.027   <2e-16 ***
x1:trt      -0.54947    0.03199 -17.175   <2e-16 ***

..more spurious results.

As to the specific models :

Repeated measures ANOVA (with "Weight" as outcome, ["Group", "Time"] as within-factors and adjusting for "subject").

This is one option that could work.

ANCOVA (with "Percent reduction in weight" as outcome, "Group" as between-factor and "baseline weight" as a covariate)

Besides the mathematical coupling problem, this would not control for repeated measures

Linear mixed effects method with "Weight" as outcome, [group, time, group*time] as fixed effects and [subject] as random effect. Again, can we use "Percent reduction in weight" here?

This would be my preferred option, but again, not with percent reduction. This should be equivalent to repeated measures ANOVA. For example with your data:

lmer(wt ~ group*time + age + gender + (1 |Subject, data=mydata)
lme(wt ~ group*time + age + gender, random= ~ 1 | Subject, data=mydata)

You may want to add random slopes by placing one or more of the fixed effects that vary within subjects (only time in this case) to the left of the |, if justified by the theory, study design, and supported by the data. Personally I always start from a model with only random intercepts.

Linear model with interaction: "Percent reduction in weight" ~ "Group" * "Baseline weight"

This should be avoided due to the mathematical coupling problem. Even if baseline was removed as a regressor, this would then just an ANOVA model, and while repeated measures are handled by the percent variable, the residuals may not be close to normal, so inference may be affected.

Difference between Repeated measures ANOVA, ANCOVA and Linear mixed effects model

One Answer

Add your own answers!

Ask a Question