Reading multilevel model syntax in intuitive ways in R (lme4)

Question

Below, I have 3 lme4 longitudinal mixed-models. Throughout, y is the response variable, group is a binary indicator for "control" vs. "treatment",  therapist (a clustering indicator), subjects (a clustering indicator), and time is the measurement time indicator (e.g., 0, 1, 2, 3).

Question: I was wondering if there is an intuitive way to understand the data/design structure each model describes OR at least the way lme4 understands each model syntax to mean?

# FIRST:
lmer(y ~ time * group + 
                (time | therapist:subjects) +
                (time * group || therapist), 
                 data = data)

# SECOND:
lmer(y ~ time * group + 
                (time | therapist:subjects) +
                (time | therapist) +
                (0 + group + time:group | therapist), 
                 data = data)

# THIRD:
lmer(y ~ time * group + 
                (1 | therapist:subjects) +  
                (0 + time | therapist:subjects) +
                (0 + time:group | therapist) + 
                (0 + group | therapist),
                 data = data)

Robert Long · Answer

Intuitively, the understanding can start with grouping variables or grouping terms. These are the terms that appear on the right side of the | or || in the random parts of the formula. They are often factor variables for which there are repeated measurements, or combinations (interactions) of a factor with another random factor (or indeed a fixed factor). These should be a single term - a vaiable or an combination/interaction, and not multiple terms (ie, you can have (1|A) or (1|A:B) but not (1|A+B) (that would be invalid). In all cases, they can be interpreted as: there is some random variation in the data at the "level" of the variable or combination of variables. We usually want to fit random intercepts for these, which will account for any non-independence due to this variation.
We can think about the number of "levels" in the model by considering the number of unique grouping terms. With a single groupig term we have a 2-level model, with 3 terms, a 3-level model. Some care is needed here because these "levels" might not correspond to levels in a multilevel model. In the analysis of experiments with a factorial design, "levels" don't really apply, so you can have things like  y ~ A + B + (1|id) + (1|id:A) + (1|id:B) and there are 3 "levels" of variation, but not in a multilevel modelling sense. See this answer for further details on this. Even in multilevel modelling this can be tricky because of nested and crossed factors. See here for further details on this. Nesting is a property of the study design, not the model.
The terms on the left side of the | or || specify which variable(s) are allowed to vary at the different levels of the grouping term. In the simplest case it is just "1" which means we want only the intercept to vary (so, just random intercepts). If we have other variables in place of the "1" then it means we want those variables (which are typically fixed effects) to vary at each level of the grouping term in addition to the intercept (that is, (time|group) is the same as (1 + time|group). These are random slopes.
If "0" appears on the left hand side of the | or || then it means that random intercepts for the groupng term at not to be fitted (usually because they are fitted by a different term in the model).
Lastly, by default lmer will attempt to estimate a correlation between the random effects. However, if || is specified, then the correlations are not estimated (is they are fixed at zero). This is actually just shorthand. For example (time||group) is the same as (1|group) + (0+time|group) which means we fit random intercepts for group, and we fit random slopes for time, but no random intercept for time, so taken together it means random intercepts for group and slopes for time but with no correlation between them. This also means that randon slopes and random intercepts will be correlated only when they are on opposite sides of a single |.
So, for your specific examples:
lmer(y ~ time * group + 
            (time | therapist:subjects) +
            (time * group || therapist), 
            data = data)

First note we have 2 different grouping terms: the therapist:subjects combination and therapist and this is the case for all 3 models. For the former we also fit random slopes for time (correlated with the random intercepts). For the latter we fit random slopes for time * group but these are not correlated with the random intercept for therapist.
lmer(y ~ time * group + 
            (time | therapist:subjects) +
            (time | therapist) +
            (0 + group + time:group | therapist), 
            data = data)

As with the first model, we have 2 different grouping terms: the therapist:subjects combination and therapist and again we have time as a random slope for the former. For the latter, this time we have random slopes for time (correlated with the random intercepts for therapist, and uncorrelated random slopes for group and time:group.
lmer(y ~ time * group + 
            (1 | therapist:subjects) +  
            (0 + time | therapist:subjects) +
            (0 + time:group | therapist) + 
            (0 + group | therapist),
            data = data)

Again we have 2 different grouping terms: the therapist:subjects combination and therapist. For the former, we have random intercepts for therapist:subjects, and random slopes for time (uncorrelated with the random itercepts). As noted above these could also be written in shorthand as (time || therapist:subjects). For the latter, there are no random intercepts at all (because 0 appears in both fomulae on the left hand side), but we fit random slopes for time:group and group
A few final points to highlight things that are invalid:
lmer(y ~ time * group + (0 | therapist)

is invalid because you can't have a single zero on the left of |. That would mean therapist is a grouping variable (implying we want random intercepts) but then the "0" means don't fit random intercepts. It's a conflict and should generate an error.
lmer(y ~ time * group + (1 | therapist*subject)

is an error because therapist*subject is not a single term - it is shorthand for therapist + subject + therapist:subject so it is eqivalent to
lmer(y ~ time * group + (1 | therapist + subject + therapist:subject)

which is invalid. If you actually wanted each of those terms to be grouping terms then you would use:
lmer(y ~ time * group + (1 | therapist) + (1 subject) + (1| therapist:subject)

Reading multilevel model syntax in intuitive ways in R (lme4)

One Answer

Add your own answers!

Ask a Question