TransWikia.com

How do I calculate confidence intervals on an elastic net regression in R

Cross Validated Asked by Alberto Pascale on November 2, 2021

I am performing an elastic net regression on my data n = 34, p = 46

I first built the model using the "caret" package with the cross validation method to set the optimal alpha and lambda parameters

data.scale <- as.data.frame(scale(data)) 
set.seed(123)
model <- train(
  value ~., data = data.scale, method = "glmnet",
  trControl = trainControl("cv", number = 10),
  tuneLength = 50)

Then I extracted the beta coefficients using the best lambda parameter

model$bestTune
coef(model$finalModel, model$bestTune$lambda)

and then, I tested the model performance on the entire dataset and calculated the RMSE and R2

x.test <- model.matrix(value ~., data.scale)[,-1]
predictions <- model %>% predict(x.test)
data.frame(
  RMSE = RMSE(predictions, data.scale$value),
  Rsquare = R2(predictions, data.scale$value)
)

Now I am trying to calculate the pvalues by bootstraping the model with the function boot.glmnet from the package "hdrm" with the following procedure

# divide explanatory variables and response variable in two separate dataframes
data.x <- data.scale %>%
  dplyr::select(-value)
data.y <- data.scale %>%
  dplyr::select(value)

# calculate confidence intervals
CI <- boot.glmnet(data.x, data.y$value, 
            alpha = 0.05,
            lambda = model$bestTune$lambda,
            B = 1000,
            bar = T)

in CI results I am getting zeros for all confidence intervals lower and upper.
I am now wondering where I am wrong. Is it a conceptual or scripting mistake? or maybe both?
I would be very happy if anyone could clarify me this.
I can provide the data if needed
Thanks

One Answer

Coding issues are off-topic on this site, but the statistical issues about determining confidence intervals in the LASSO part of elastic net deserve some comment. In brief, this is not an easy problem. The hdrm package does not appear to be on CRAN or externally vetted; its boot.glmnet() function is an overly simple (mis?)application of the bootstrap as the comment from Thomas Lumley notes.

This page and its links provides an introduction to the difficulties with estimating CI and p-values with LASSO. For example, LASSO's choice among a set of correlated predictors might differ from one bootstrapped sample to another. How do you want to think about CI for a predictor that is sometimes present and sometimes omitted from a model? For some types of models it is possible to estimate CI for predictors, but that requires substantial care and thought.

With LASSO and elastic net the primary consideration is usually predictive performance. You might consider developing models on multiple bootstrapped samples of your data and evaluating predictive performance against the full original data set as a way to estimate the reliability of your modeling process.

Answered by EdM on November 2, 2021

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP