How do I treat my Confounding variables in my multivariate Linear Mixed Model?

Question

I'm trying to build a linear mixed model for 5 outcome variables ...

Cholesterol 1,Cholesterol 2,Cholesterol 3,Cholesterol 4,Cholesterol 5

which will be melted into a single Cholesterol variable, since statsmodel does not support multivariate LMM so far.
The independed variables are 38 specific pathogenetic features build from GenePy scores.
I have to correct for the following confounders:
Age, Sex, Group ,Alcohol, Smoking and Levodopa treatment. All of them might contribute to the outcome of the Cholesterol outcome.
Sex, Group and Levodopa treatment are binary categorical (0 or 1).
My question would be, how do I properly build up the equation for my model and put it into the statsmodel syntax?
My guess so far is: I treat the 38 specific pathogenetic features as fixed effects and the confounders would be random effects. All catergorical confounders are put into the "groups" option of the statsmodel syntax
Based on the statsmodel syntax:
model = sm.MixedLM.from_formula("Cholesterol ~ pathogenetic feature1 + pathogenetic feature 2 + ... pathogenetic feature 38 , data, re_formula="~Age+Alcohol+Smoking", groups=data["Group,Sex,Levodopa"])
Is that correct or nonsense? I'm a rookie in this topic and apologize for my weak understanding of it. Thanks so much in advance !

Robert Long · Answer

Confounders can be controlled for by treating them as fixed or random. The usual considerations for treating variables as fixed or random apply (There are many questions and answers on our site on that topic).
The variables in your formula, Age, Alcohol and Smoking typically would be modelled as fixed, not random.
To be a confounder a variable is generally a cause, or a proxy for a cause of both the exposure and the outcome. Where you have multiple exposures, as you seem to have, a confounder for one causal path may be a mediator for another. Mediators should be excluded. This means that great care must be taken when choosing the set of variables to include in a model.
A causal diagram or directed cyclic graph (DAG) can be of great benefit in this type of situation. For example see here:
How do DAGs help to reduce bias in causal inference?
It is very important not to just put all your variables into one model.

How do I treat my Confounding variables in my multivariate Linear Mixed Model?

One Answer

Add your own answers!

Ask a Question