TransWikia.com

Adding high p-value and low R square features in linear regression model to improve result

Data Science Asked by Shahnawaz Khan on January 31, 2021

I am working on a linear regression problem. The features for my analysis have been selected using p-values and domain knowledge. After selecting these features, the performance of $R^2$ and the $RMSE$ improved from 0.25 to 0.85. But here is the issue, the features selected using domain knowledge have very high p-values (0.7, 0.9) and very low $R^2$ (0.002, 0.0004). Does it make sense to add such features even if your model shows improvement in performance. As far I know, according to linear regression, it is preferable to only keep features with low p-values.

Can anyone share their experience? If yes, then how can I back up my proposal of new features with high p-values.

One Answer

In general, adding more features will increase the quality of model fit.

If your goal is best fitting modeling, add as many features as possible (regardless of p-value).

Sometimes people care about parsimonious models, they are will to lower the overall model fit because they also value a simpler model. Then they apply a threshold to features using p-values.

Answered by Brian Spiering on January 31, 2021

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP