Cross Validated Asked by andy_dorsey on January 1, 2022

I am learning about ridge regression. I was under the impression that ridge regression is valuable because it provides better out of sample predictive accuracy than standard linear models. For example, see the bottom of page 217 in this well known statistical learning text: http://faculty.marshall.usc.edu/gareth-james/ISL/ISLR%20Seventh%20Printing.pdf. I tried setting up a short simulation to demonstrate it, but my results aren’t showing that ridge models are superior.

First, I simulated the exact multiarm design using DeclareDesign in R (the only difference is I boosted the N = 300). I then set up a simulation where I simulated a data set 1,000 times, split it into a test and training data set, and then fit a linear model and ridge regression model to the training data set. I then examined how well each model predicted responses in the test data set. Surprisingly, I don’t show that the linear model does any worse. I must be going wrong somewhere, right? Below is my code – it doesn’t take long to run and I’d appreciate any tips on where I might have went wrong.

```
# Add libraries
library(DeclareDesign)
library(ridge)
library(tidyverse)
library(fastDummies)
# Use DeclareDesign to get function that can simulate data
N <- 300
outcome_means <- c(0.5, 1, 2, 0.5)
sd_i <- 1
outcome_sds <- c(0, 0, 0, 0)
population <- declare_population(N = N, u_1 = rnorm(N, 0, outcome_sds[1L]),
u_2 = rnorm(N, 0, outcome_sds[2L]), u_3 = rnorm(N, 0, outcome_sds[3L]),
u_4 = rnorm(N, 0, outcome_sds[4L]), u = rnorm(N) * sd_i)
potential_outcomes <- declare_potential_outcomes(formula = Y ~ (outcome_means[1] +
u_1) * (Z == "1") + (outcome_means[2] + u_2) * (Z == "2") +
(outcome_means[3] + u_3) * (Z == "3") + (outcome_means[4] +
u_4) * (Z == "4") + u, conditions = c("1", "2", "3", "4"),
assignment_variables = Z)
estimand <- declare_estimands(ate_Y_2_1 = mean(Y_Z_2 - Y_Z_1), ate_Y_3_1 = mean(Y_Z_3 -
Y_Z_1), ate_Y_4_1 = mean(Y_Z_4 - Y_Z_1), ate_Y_3_2 = mean(Y_Z_3 -
Y_Z_2), ate_Y_4_2 = mean(Y_Z_4 - Y_Z_2), ate_Y_4_3 = mean(Y_Z_4 -
Y_Z_3))
assignment <- declare_assignment(num_arms = 4, conditions = c("1", "2", "3",
"4"), assignment_variable = Z)
reveal_Y <- declare_reveal(assignment_variables = Z)
estimator <- declare_estimator(handler = function(data) {
estimates <- rbind.data.frame(ate_Y_2_1 = difference_in_means(formula = Y ~
Z, data = data, condition1 = "1", condition2 = "2"),
ate_Y_3_1 = difference_in_means(formula = Y ~ Z, data = data,
condition1 = "1", condition2 = "3"), ate_Y_4_1 = difference_in_means(formula = Y ~
Z, data = data, condition1 = "1", condition2 = "4"),
ate_Y_3_2 = difference_in_means(formula = Y ~ Z, data = data,
condition1 = "2", condition2 = "3"), ate_Y_4_2 = difference_in_means(formula = Y ~
Z, data = data, condition1 = "2", condition2 = "4"),
ate_Y_4_3 = difference_in_means(formula = Y ~ Z, data = data,
condition1 = "3", condition2 = "4"))
names(estimates)[names(estimates) == "N"] <- "N_DIM"
estimates$estimator_label <- c("DIM (Z_2 - Z_1)", "DIM (Z_3 - Z_1)",
"DIM (Z_4 - Z_1)", "DIM (Z_3 - Z_2)", "DIM (Z_4 - Z_2)",
"DIM (Z_4 - Z_3)")
estimates$estimand_label <- rownames(estimates)
estimates$estimate <- estimates$coefficients
estimates$term <- NULL
return(estimates)
})
multi_arm_design <- population + potential_outcomes + assignment +
reveal_Y + estimand + estimator
# Get holding matrix for R2 values
rsq_values <- matrix(nrow = 1000, ncol = 2)
# Simulate
for (i in 1:100){
# Get simulated data set
input_data <- draw_data(multi_arm_design)
# Format data for analysis
input_data <- input_data %>%
fastDummies::dummy_cols(select_columns = "Z", remove_first_dummy = TRUE) %>%
select(Y:Z_4)
# Prep training and test data
#set.seed(206) # set seed to replicate results
training_index <- sample(1:nrow(input_data), 0.7*nrow(input_data)) # indices for 70% training data - arbitrary
training_data <- input_data[training_index, ] # training data
test_data <- input_data[-training_index, ] # test data
# Fit linear model
lm_mod <- lm(Y ~ ., data = training_data)
# Fit ridge regression
ridge_mod <- linearRidge(Y ~ ., data = training_data)
# Get actual (from test data) and fitted values for each model
actual <- test_data$Y
lm_predicted <- predict(lm_mod, test_data) # predict linear model on test data
ridge_predicted <- predict(ridge_mod, test_data) # predict ridge model on test data
# See how well linear model from training data fits test data (expressed as R2)
lm_rss <- sum((lm_predicted - actual) ^ 2)
lm_tss <- sum((actual - mean(actual)) ^ 2)
lm_rsq <- 1 - lm_rss/lm_tss
rsq_values[i, 1] <- lm_rsq
# See how well ridge model from training data fits test data (expressed as R2)
ridge_rss <- sum((ridge_predicted - actual) ^ 2)
ridge_tss <- sum((actual - mean(actual)) ^ 2)
ridge_rsq <- 1 - ridge_rss/ridge_tss
rsq_values[i, 2] <- ridge_rsq
}
# Make matrix into data frame
rsq_values <- data.frame(rsq_values)
# Summarize R2 values for linear model
summary(rsq_values$X1)
# Summarize R2 values for ridge model
summary(rsq_values$X2)
```

You are doing nothing wrong. Ridge regression, the LASSO, and other penalized-coefficient regressions yield biased estimations. The idea is that maybe accepting a little bias will greatly reduce the variance.

However, there is nothing in how ridge regression, the LASSO, etc. are formulated that guarantees they will perform better at predictions of out of sample. Sometimes a simple linear model informed by theory and created by an analyst who knows the problem domain can trounce a model selected by ridge regression. This is true across problem domains and in all sorts of circumstances.

This is, essentially, a question about model selection. There is no need for code; the issue is not specific to your data or method of inference. Your findings illustrate that model selection (or what ML/AI people call feature selection) is not a solved problem.

Answered by kurtosis on January 1, 2022

0 Asked on November 9, 2021

classification error machine learning model selection modeling

0 Asked on November 9, 2021 by atrag

0 Asked on November 9, 2021 by ryu576

descriptive statistics hypothesis testing large data order statistics

3 Asked on November 9, 2021 by sabiste

1 Asked on November 9, 2021 by moneyball

1 Asked on November 9, 2021 by thomas-richardson

0 Asked on November 9, 2021

1 Asked on November 9, 2021

1 Asked on November 9, 2021 by tshast2

1 Asked on November 9, 2021

0 Asked on November 9, 2021

0 Asked on November 9, 2021 by rob-goldstone

linear algebra markov process optimization transition matrix

1 Asked on November 9, 2021

0 Asked on November 6, 2021 by ana-pau-de-la-fuente

2 Asked on November 6, 2021

1 Asked on November 6, 2021 by rishu

1 Asked on November 6, 2021 by user235039

forecasting measurement error model evaluation out of sample

1 Asked on November 6, 2021 by yilei-huang

2 Asked on November 6, 2021 by marvdste

1 Asked on November 6, 2021

Get help from others!

Recent Answers

- Lex on Does Google Analytics track 404 page responses as valid page views?
- Peter Machado on Why fry rice before boiling?
- Jon Church on Why fry rice before boiling?
- Joshua Engel on Why fry rice before boiling?
- haakon.io on Why fry rice before boiling?

Recent Questions

- How Do I Get The Ifruit App Off Of Gta 5 / Grand Theft Auto 5
- Iv’e designed a space elevator using a series of lasers. do you know anybody i could submit the designs too that could manufacture the concept and put it to use
- Need help finding a book. Female OP protagonist, magic
- Why is the WWF pending games (“Your turn”) area replaced w/ a column of “Bonus & Reward”gift boxes?
- Does Google Analytics track 404 page responses as valid page views?

© 2023 AnswerBun.com. All rights reserved. Sites we Love: PCI Database, MenuIva, UKBizDB, Menu Kuliner, Sharing RPP, SolveDir