TransWikia.com

Should I include all dummy variables or N-1 dummy variables (keep one as reference) in neural networks

Data Science Asked on July 31, 2021

I have a categorical variable with N factor levels (e.g. gender has two levels) in classification problem. I have converted it into dummy variables (male and female).

I have to use neural network (nnet) to classify. I have two options –

  1. Include any N-1 dummy variables in the input data (e.g. include either male or female). In statistical models, we use N-1 dummy variables.
  2. Include all N dummy variables (e.g. include both male and female)

Can someone please highlight the pros and cons of both options in predictive power and interpretability

One Answer

I will answer the question in the Neural Network context (i.e. I won't talk about regularization in regression algorithms handles this problem).

The problem of not dropping one of the encoded variables will cause multicollinearity. That is, one of the variables can be estimated using others. e.g. If you have a variable that says whether a person is female or not, why would you need another variable which says whether a person is male or not. However, the main thing that people forgot to say while answering this question is that multicollinearity is not actually a big problem unless you need the interpretation of your variables. Multicollinearity will cause your coefficients to be false, but it won't affect your predictions. Thus, it should not be a big problem unless you need to explain your model in terms of your variables. Another possibility is that, if the covariance between your variables in the training set and test set would be different, then your predictions would be affected, and you would have incorrect results. However, if you shuffle your dataset good enough (which is your assumption) and split the train and test set (also validation) correctly then covariance should be the same in both. In other words, the relation between those correlated variables can be assumed to stay the same in the train and test set. Thus, you can safely assume that your predictions are correct.

Correct answer by Shahriyar Mammadli on July 31, 2021

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP