Validation loss fluctuating while training the neural network in tensorflow

Question

While training my convolutional neural network to predict emotions, I displayed at the same time the training and the validation data loss. The training loss appear to decrease over time, while on the other hand, the validation data loss has some weird behavior. Below is the figure that I obtained while training the model in three different ways.

Please note that the model is composed of 4 convolutional layers followed by a recurrent neural network (GRU) which is responsible for detecting sequences in the input data.

The light blue curve correspond to the model where the images are fed into the model in order, starting from 0 till frame 7500, without shuffling on each epoch and in each training step. In this case, the initial state of the RNN was set to be equal to the output (or the last state) of the RNN.

The red curve corresponds to the model where the images are fed into the model in order as well(same as the previous case), without shuffling, but the initial state of the RNN at each training step is set to 0.

The dark blue curve corresponds to a model where the images are fed into the model randomly (starting frame is chosen randomly) and the number of frames is chosen randomly as well. In this case, the initial state to the RNN was initialized to zero as well.

Therefore, I would like to know whether the shape of the loss function of the validation dataset is reasonable. To me it doesn't make sense how it is fluctuating, and maybe the only reasonable curve is the dark blue one (we can assume that after overfitting the validation loss starts increasing)

Does the light blue and the red curves indicate any error or mistake in model? Or the data is too noisy so that I'm getting this fluctuating curve?

I am using MSE as a loss function.

Below is the loss on the training dataset.

Any help is much appreciated!!

Alluri L S V Siddhartha Varma · Answer

If you are performing a classification task, you should not use the MSE Loss function. MSE Loss function acts well for regression tasks, but it will be a non-convex optimization while using it for Classification.
Try using Binary Cross Entropy or Cross-Entropy Loss function.
I answered what I know according to my knowledge, I hope it's helpful. Happy Coding!!

Validation loss fluctuating while training the neural network in tensorflow

One Answer

Add your own answers!

Ask a Question