# How do I treat my Confounding variables in my multivariate Linear Mixed Model?

Cross Validated Asked by Thomas Lordick on October 29, 2020

I’m trying to build a linear mixed model for 5 outcome variables …

• Cholesterol 1,Cholesterol 2,Cholesterol 3,Cholesterol 4,Cholesterol 5

which will be melted into a single Cholesterol variable, since statsmodel does not support multivariate LMM so far.

The independed variables are 38 specific pathogenetic features build from GenePy scores.

I have to correct for the following confounders:
Age, Sex, Group ,Alcohol, Smoking and Levodopa treatment. All of them might contribute to the outcome of the Cholesterol outcome.
Sex, Group and Levodopa treatment are binary categorical (0 or 1).

My question would be, how do I properly build up the equation for my model and put it into the statsmodel syntax?

My guess so far is: I treat the 38 specific pathogenetic features as fixed effects and the confounders would be random effects. All catergorical confounders are put into the "groups" option of the statsmodel syntax

Based on the statsmodel syntax:

model = sm.MixedLM.from_formula("Cholesterol ~ pathogenetic feature1 + pathogenetic feature 2 + … pathogenetic feature 38 , data, re_formula="~Age+Alcohol+Smoking", groups=data["Group,Sex,Levodopa"])

Is that correct or nonsense? I’m a rookie in this topic and apologize for my weak understanding of it. Thanks so much in advance !

Confounders can be controlled for by treating them as fixed or random. The usual considerations for treating variables as fixed or random apply (There are many questions and answers on our site on that topic).

The variables in your formula, Age, Alcohol and Smoking typically would be modelled as fixed, not random.

To be a confounder a variable is generally a cause, or a proxy for a cause of both the exposure and the outcome. Where you have multiple exposures, as you seem to have, a confounder for one causal path may be a mediator for another. Mediators should be excluded. This means that great care must be taken when choosing the set of variables to include in a model.

A causal diagram or directed cyclic graph (DAG) can be of great benefit in this type of situation. For example see here:
How do DAGs help to reduce bias in causal inference?

It is very important not to just put all your variables into one model.

Answered by Robert Long on October 29, 2020

## Related Questions

### Good way to transfer parent timeseries knowledge( trend/seasonal ) to children?

0  Asked on December 25, 2021

### Why does the value of a conversion rate change the number of observations required when calculating statistical power?

1  Asked on December 25, 2021 by senmck

### Interpreting group-level random effects of a multilevel model

3  Asked on December 23, 2021

### Would the support vectors in SVM algorithm change with scaling of the functional margin?

1  Asked on December 23, 2021 by sud-k

### How to complete analysis with small N?

2  Asked on December 23, 2021 by jess-jax

### Nested CV with Online Learning

1  Asked on December 23, 2021

### Validation and Learning Curves with Pipeline or Model Only?

1  Asked on December 23, 2021 by odisseo

### leverage() diagnostic test not supported for glmmTMB models in r

1  Asked on December 23, 2021 by blundering-ecologist

### Minimum random sample with at least one persson that is infected with coronavirus

0  Asked on December 23, 2021 by code-guru

### Which type of regression is most suitable for my situation?

1  Asked on December 23, 2021 by nikolay-bogdanov

### Calculating Gini coefficient with unbound income brackets?

3  Asked on December 21, 2021 by warwick-masson

### Feature map for the Gaussian kernel

4  Asked on December 21, 2021

### When are real limits used for calculating z score?

1  Asked on December 21, 2021 by gintas_

### The importance() in randomForest returns different results, how to interpret this?

1  Asked on December 21, 2021 by lucy-zhang

### How to compute confidence interval for leave-one-out cross-validated AUC that is also repeated many times?

2  Asked on December 21, 2021 by max-lumberjack

### Does Discretization improve Classifier Performance?

0  Asked on December 21, 2021

### Testing differences between samples vs. population

1  Asked on December 21, 2021

### Hazard ratio (HR) interpretation for multi-level categorical and continues variables

1  Asked on December 21, 2021

### Real-life examples of common distributions

7  Asked on December 20, 2021 by roark

### Methodological question: adjusting for confounders in logistic regression

1  Asked on December 20, 2021 by tohweizhong