I’m trying a relaxed lasso logistic regression by first using sklearn’s cross validation to find an optimal penalty parameter (C = 1/lambda). Then, I use that parameter to fit statsmodel’s logit model to the data (lambda = 1/C). At this step, I removed coefficients that are really small (< 1e-5). When I performed cross validation again on the reduced feature set to estimate my second penalty parameter, it ends up being stronger than the first penalty. This doesn’t seem right to me, since the second step should be to find better estimates for features that you’ve already identified as non-zero from the first feature reduction step (so the lambda should be weaker).

I’ve also noticed that a lot of p values for the coefficients in the statsmodel results are NaN, even though the model has converged. Does this happen because there is high multicollinearity among my features? I had assumed that the Lasso would be able to handle collinearity. Perhaps my cross validation isn’t providing the best penalty parameter?

From Elements of Statistical Learning:

…the lasso shrinkage causes the estimates of the non-zero coefficients to be biased towards zero, and in general they are not consistent. One approach for reducing this bias is to run the lasso to identify the set of non-zero coefficients, and then fit an unrestricted linear model to the selected set of features. This is not always feasible, if the selected set is large. Alternatively, one can use the lasso to select the set of non-zero predictors, and then apply the lasso again, but using only the selected predictors from the first step. This is known as the relaxed lasso (Meinshausen, 2007). The idea is to use cross-validation to estimate the initial penalty parameter for the lasso, and then again for a second penalty parameter applied to the selected set of predictors. Since Statistical consistency means as the sample size grows, the estimates converge to the true values. The variables in the second step have less “competition” from noise variables, cross-validation will tend to pick a smaller value for λ, and hence their coefficients will be shrunken less than those in the initial estimate.

Cross Validated Asked by Joanne Cheung on December 30, 2020

2 AnswersIt looks like the summary table for logit always gives a NaN standard error for parameters whose coefficient estimate is shrunk to zero. Is this where you are seeing NaN standard errors? If so, these are not really NaN's, it's just that there isn't a good way to come up with a standard error for parameters that are shrunk to zero (or at least one has not been implemented here). I'm also not sure how the SE's for the nonzero coefficients are computed here.

```
Logit Regression Results
==============================================================================
Dep. Variable: y No. Observations: 1000
Model: Logit Df Residuals: 998
Method: MLE Df Model: 1
Date: Thu, 28 Feb 2019 Pseudo R-squ.: 0.1600
Time: 20:48:32 Log-Likelihood: -582.11
converged: True LL-Null: -692.99
LLR p-value: 3.763e-50
==============================================================================
coef std err z P>|z| [0.025 0.975]
------------------------------------------------------------------------------
Intercept 0 nan nan nan nan nan
x1 0.8917 0.079 11.355 0.000 0.738 1.046
x2 -0.4602 0.073 -6.284 0.000 -0.604 -0.317
x3 0 nan nan nan nan nan
==============================================================================
```

Answered by Kerby Shedden on December 30, 2020

Are you using statsmodels GLM or statsmodels Logit? They are equivalent, but the scaling of the penalty parameter for regularized fitting is different. Below is an example of how to get equivalent results from these two procedures and from sklearn.

```
import numpy as np
import pandas as pd
import statsmodels.api as sm
from sklearn.linear_model import LogisticRegression
n = 1000
x = np.random.normal(size=(n, 3))
lp = x[:, 0] - x[:, 1] / 2
pr = 1 / (1 + np.exp(-lp))
y = (np.random.uniform(size=n) < pr).astype(np.int)
df = pd.DataFrame({"y": y, "x1": x[:, 0], "x2": x[:, 1], "x3": x[:, 2]})
alpha = 10
model1 = sm.GLM.from_formula("y ~ x1 + x2 + x3", family=sm.families.Binomial(),
data=df)
result1 = model1.fit_regularized(alpha=alpha/n, L1_wt=1)
model2 = sm.Logit.from_formula("y ~ x1 + x2 + x3", data=df)
result2 = model2.fit_regularized(alpha=alpha)
x0 = np.concatenate((np.ones((n, 1)), x), axis=1)
model3 = LogisticRegression(C=1/alpha, penalty='l1')
result3 = model3.fit(x0, y)
print(result1.params, result2.params, result3.coef_)
```

Answered by Kerby Shedden on December 30, 2020

0 Asked on February 23, 2021 by afton-nelson

0 Asked on February 23, 2021 by jamzy

1 Asked on February 23, 2021 by fluctuation

1 Asked on February 22, 2021 by bashir

bounds heavy tailed sample size sampling statistical significance

0 Asked on February 22, 2021 by maxbit

1 Asked on February 21, 2021 by tauling

2 Asked on February 21, 2021 by shenflow

1 Asked on February 21, 2021

classification machine learning neural networks regression survival

1 Asked on February 21, 2021 by joel

5 Asked on February 20, 2021 by gnal

2 Asked on February 20, 2021 by roccer

mixed model random effects model regression sample size statistical power

3 Asked on February 20, 2021 by abdu

0 Asked on February 20, 2021 by user307627

1 Asked on February 19, 2021 by daniel-l

boosting classification machine learning multinomial distribution r

0 Asked on February 19, 2021 by amnon

1 Asked on February 19, 2021 by gerg-horvth

7 Asked on February 19, 2021

0 Asked on February 19, 2021 by kyle-pena

false discovery rate fishers exact test hypothesis testing multiple comparisons p value

5 Asked on February 19, 2021 by thomas-fauskanger

1 Asked on February 18, 2021 by statcurious

Get help from others!

Recent Answers

- Peter Machado on Why fry rice before boiling?
- DMGregory on MouseLook Script “Pops” back to the last value when the script is enabled after being disabled or destroyed
- haakon.io on Why fry rice before boiling?
- Joshua Engel on Why fry rice before boiling?
- Jon Church on Why fry rice before boiling?

Recent Questions

- Why fry rice before boiling?
- MouseLook Script “Pops” back to the last value when the script is enabled after being disabled or destroyed
- Unity app crashes when using unmodified custom Android manifest (didn’t find class “UnityPlayerActivity”)
- How do i draw a ray in unity
- How to test consistency of responses?

© 2022 AnswerBun.com. All rights reserved.