When to preprocess data for neural network

Question

In https://cs231n.github.io/neural-networks-2/, the authors say that preprocessing should only be done on the training set, and then the mean, variance, etc. of the training set should be used on the validation and test sets. I can't find any other source on why this should be done, and I'm wondering how important this is compared to taking the data mean and variance and normalizing based on those at the beginning. Also, if it is very important to only normalize the training set and apply that to the rest of the data set, how would that be implemented with cross-validation in keras?

Match Maker EE · Answer

Experience with many different datasets containing discrete or continuous input variables have shown me that normalization makes training much easier. The randomly selected set of initial weights is most often selected from a uniform (+/-) distribution. Hence, normalized variables favor a smooth and unbiased start of the learning process.
A practical approach is to normalize based on the data distribution in the training set. The training and test sets should be obtained from the same distribution. I use a random generator the choose which case goes to the training set, or whether it is assigned to the test set - all before any training takes place. You can then apply the normalization constants to the test set. As these two sets are interchangeable, there is no issue with using the normalization constants calculated from the training set, for normalizing the test set as well.
My simple scheme for continuous input variables, was to divide each data point by its maximal value in the training set
$
x_{norm}(i) = x_{data}(i) / max(x_{data}(i))
$
Discrete input variables need not be normalized, when they are either $0$ or $1$.
There are published papers in the literature that compare normalization schemes and the properties of the resulting neural networks.
Preprocessing - in general - is an open field.
Within image processing, some colleagues have initialized the NN-weights of the first hidden layer with the coefficients of different linear filters. See for an overview for example Egmont-Petersen et al. 2002, Image processing with neural networks - a review, Pattern Recognition.

When to preprocess data for neural network

One Answer

Add your own answers!

Ask a Question