What is the most sound way to perform variable selection on an lmer() model?

Question

Suppose I have 25 candidate predictors in an lmer model. I want to find out which ones are genuine predictors of the dependent variable.
What is the best way to perform variable selection on that lmer model?
I have read about the drawbacks of stepwise regression, and so I assume that is not the best approach. Though, I have read that some stepwise approaches are better than others (e.g., AIC).
I've used penalized regression in the past but I'm not sure if this can be done with an lmer model.
I've also recently read about using a Bayesian approach that places laplace priors on each predictor, essentially acting like a LASSO regression and shrinking most of them to 0.
I've also heard about random forests but don't believe this can be implemented for an lmer model.
I also know that the best way is to use theory to test specific predictors. But in this case, other than one predictor, I don't have theoretical reasons to believe some predictors should matter more than others. I would rather use a data driven approach to find the ones that do matter, rather than testing them all based on very loose assumptions I hold.
What would you suggest?

Deathkill14 · Answer

I would recommend the drop1 function in the R package lmerTest. lmerTest::drop1 also produces an F-test: not only is this test more accurate than the likelihood ratio test by lme4::drop1, it also avoids refitting the model which saves time if that is important. So this corresponds to what you have said about stepwise being a bit better than some other's since drop1 is kind of like stepwise.
There are some other parameter selection tools in lmerTest. You may have already exploref them.
To the point though, different approaches are more interesting for different contexts. I would look to lasso for models with many parameters since the tests used in stepwise methods lose power as parameters increase. For fewer parameters, say 10 I would run a stepwise or drop1 methods and see if the results seem reasonable. Data driven variable selection is usually not preferred to foreground knowledge of the likely parameterization of your model like you might have if you are working on a well studied problem in physics. If you need to let the data speak, less sophisticated methods like drop1 on a small number of parameters can be a good starting point to get a better insight into what is important. It's up to you to take it from there.

What is the most sound way to perform variable selection on an lmer() model?

One Answer

Add your own answers!

Ask a Question