# Why isn't simulation showing that ridge regression better than linear model

Cross Validated Asked by andy_dorsey on January 1, 2022

I am learning about ridge regression. I was under the impression that ridge regression is valuable because it provides better out of sample predictive accuracy than standard linear models. For example, see the bottom of page 217 in this well known statistical learning text: http://faculty.marshall.usc.edu/gareth-james/ISL/ISLR%20Seventh%20Printing.pdf. I tried setting up a short simulation to demonstrate it, but my results aren’t showing that ridge models are superior.

First, I simulated the exact multiarm design using DeclareDesign in R (the only difference is I boosted the N = 300). I then set up a simulation where I simulated a data set 1,000 times, split it into a test and training data set, and then fit a linear model and ridge regression model to the training data set. I then examined how well each model predicted responses in the test data set. Surprisingly, I don’t show that the linear model does any worse. I must be going wrong somewhere, right? Below is my code – it doesn’t take long to run and I’d appreciate any tips on where I might have went wrong.

# Add libraries
library(DeclareDesign)
library(ridge)
library(tidyverse)
library(fastDummies)

# Use DeclareDesign to get function that can simulate data
N <- 300
outcome_means <- c(0.5, 1, 2, 0.5)
sd_i <- 1
outcome_sds <- c(0, 0, 0, 0)

population <- declare_population(N = N, u_1 = rnorm(N, 0, outcome_sds[1L]),
u_2 = rnorm(N, 0, outcome_sds[2L]), u_3 = rnorm(N, 0, outcome_sds[3L]),
u_4 = rnorm(N, 0, outcome_sds[4L]), u = rnorm(N) * sd_i)
potential_outcomes <- declare_potential_outcomes(formula = Y ~ (outcome_means +
u_1) * (Z == "1") + (outcome_means + u_2) * (Z == "2") +
(outcome_means + u_3) * (Z == "3") + (outcome_means +
u_4) * (Z == "4") + u, conditions = c("1", "2", "3", "4"),
assignment_variables = Z)
estimand <- declare_estimands(ate_Y_2_1 = mean(Y_Z_2 - Y_Z_1), ate_Y_3_1 = mean(Y_Z_3 -
Y_Z_1), ate_Y_4_1 = mean(Y_Z_4 - Y_Z_1), ate_Y_3_2 = mean(Y_Z_3 -
Y_Z_2), ate_Y_4_2 = mean(Y_Z_4 - Y_Z_2), ate_Y_4_3 = mean(Y_Z_4 -
Y_Z_3))
assignment <- declare_assignment(num_arms = 4, conditions = c("1", "2", "3",
"4"), assignment_variable = Z)
reveal_Y <- declare_reveal(assignment_variables = Z)
estimator <- declare_estimator(handler = function(data) {
estimates <- rbind.data.frame(ate_Y_2_1 = difference_in_means(formula = Y ~
Z, data = data, condition1 = "1", condition2 = "2"),
ate_Y_3_1 = difference_in_means(formula = Y ~ Z, data = data,
condition1 = "1", condition2 = "3"), ate_Y_4_1 = difference_in_means(formula = Y ~
Z, data = data, condition1 = "1", condition2 = "4"),
ate_Y_3_2 = difference_in_means(formula = Y ~ Z, data = data,
condition1 = "2", condition2 = "3"), ate_Y_4_2 = difference_in_means(formula = Y ~
Z, data = data, condition1 = "2", condition2 = "4"),
ate_Y_4_3 = difference_in_means(formula = Y ~ Z, data = data,
condition1 = "3", condition2 = "4"))
names(estimates)[names(estimates) == "N"] <- "N_DIM"
estimates$$estimator_label <- c("DIM (Z_2 - Z_1)", "DIM (Z_3 - Z_1)", "DIM (Z_4 - Z_1)", "DIM (Z_3 - Z_2)", "DIM (Z_4 - Z_2)", "DIM (Z_4 - Z_3)") estimates$$estimand_label <- rownames(estimates)
estimates$$estimate <- estimates$$coefficients
estimates$term <- NULL return(estimates) }) multi_arm_design <- population + potential_outcomes + assignment + reveal_Y + estimand + estimator # Get holding matrix for R2 values rsq_values <- matrix(nrow = 1000, ncol = 2) # Simulate for (i in 1:100){ # Get simulated data set input_data <- draw_data(multi_arm_design) # Format data for analysis input_data <- input_data %>% fastDummies::dummy_cols(select_columns = "Z", remove_first_dummy = TRUE) %>% select(Y:Z_4) # Prep training and test data #set.seed(206) # set seed to replicate results training_index <- sample(1:nrow(input_data), 0.7*nrow(input_data)) # indices for 70% training data - arbitrary training_data <- input_data[training_index, ] # training data test_data <- input_data[-training_index, ] # test data # Fit linear model lm_mod <- lm(Y ~ ., data = training_data) # Fit ridge regression ridge_mod <- linearRidge(Y ~ ., data = training_data) # Get actual (from test data) and fitted values for each model actual <- test_data$Y
lm_predicted <- predict(lm_mod, test_data) # predict linear model on test data
ridge_predicted <- predict(ridge_mod, test_data) # predict ridge model on test data

# See how well linear model from training data fits test data (expressed as R2)
lm_rss <- sum((lm_predicted - actual) ^ 2)
lm_tss <- sum((actual - mean(actual)) ^ 2)
rsq_values[i, 1] <- lm_rsq

# See how well ridge model from training data fits test data (expressed as R2)
ridge_rss <- sum((ridge_predicted - actual) ^ 2)
ridge_tss <- sum((actual - mean(actual)) ^ 2)
rsq_values[i, 2] <- ridge_rsq
}

# Make matrix into data frame
rsq_values <- data.frame(rsq_values)

# Summarize R2 values for linear model
summary(rsq_values$X1) # Summarize R2 values for ridge model summary(rsq_values$X2)


You are doing nothing wrong. Ridge regression, the LASSO, and other penalized-coefficient regressions yield biased estimations. The idea is that maybe accepting a little bias will greatly reduce the variance.

However, there is nothing in how ridge regression, the LASSO, etc. are formulated that guarantees they will perform better at predictions of out of sample. Sometimes a simple linear model informed by theory and created by an analyst who knows the problem domain can trounce a model selected by ridge regression. This is true across problem domains and in all sorts of circumstances.

This is, essentially, a question about model selection. There is no need for code; the issue is not specific to your data or method of inference. Your findings illustrate that model selection (or what ML/AI people call feature selection) is not a solved problem.

Answered by kurtosis on January 1, 2022

## Related Questions

### How to test consistency of responses?

2  Asked on January 7, 2022 by user4451922

### How can I understand these variograms?

1  Asked on January 7, 2022 by zheyuan-li

### Trying to wrap my arms around copulas

2  Asked on January 7, 2022 by esurfsnake

### Justification for and optimality of $R^2_{adj.}$ as a model selection criterion

3  Asked on January 7, 2022

### How to interpret results of mixed longitudinal model in R?

1  Asked on January 7, 2022 by lazylarry

### $E(xy)<infty$ proof

0  Asked on January 7, 2022

### Calculating eigen values from principal components and deciding on the number of principal components?

1  Asked on January 7, 2022

### Question on method for analysis

0  Asked on January 5, 2022 by paul-kumar

### Selecting uncorrelated samples from a set of bulk data that contains correlated and dependent samples

1  Asked on January 5, 2022 by sarmes

### Test for significance of peaks (maximum) in time series

1  Asked on January 5, 2022 by dmort

### How does autoregression work in R?

2  Asked on January 5, 2022

### Error propagation in combined linear models

1  Asked on January 5, 2022 by chochot

### When calculating the Gini coefficient for the US, how should the portion of the population which has not filed a return be incorporated?

0  Asked on January 5, 2022

### Sampling uncertainty of posterior probability distribution

1  Asked on January 5, 2022 by milan-bosnic

### Why can’t do ridge regression with one predictor?

1  Asked on January 5, 2022

### How to model properly sequential data when the output has to be used as part of the next input? Model off completely when it makes single mistake

1  Asked on January 5, 2022 by iuppiter

### Determining whether logistic regression with robust variance for repeated measures is appropriate for my data, or which other model type to use

0  Asked on January 5, 2022

### How to create an ROC model using three classes

0  Asked on January 5, 2022 by kliocontar

### What is the origin of the name “conjugate prior”?

2  Asked on January 5, 2022

### Linear Mixed Regression Variance Decomposition

0  Asked on January 5, 2022 by lstdnce