Why am I getting a difference between training accuracy and accuracy calculated with Keras' predict_classes on a subset of the training data?

Question

I'm trying to solve a binary classification problem with AlexNet. I split the original dataset into training and validation datasets using a 70/30 ratio. I have trained my neural network with a dataset of 11200 images, and I obtained a training accuracy of 99% and a validation accuracy was 96%. At the end of the training, I saved my model's weights to a file.
After training, I loaded the saved weights to the same neural network. I chose 738 images out of the 11200 training images, and I tried to predict the class of each of them with my model, and compare them with true labels, then again I calculated the accuracy percentage and it was 74%.
What is the problem here? I guess its accuracy should be about 96% again.
Here's the code that I'm using.
prelist=[]
for i in range(len(x)):
    prediction = model.predict_classes(x[i])
    prelist.append(prediction)
count = 0
for i in range(len(x)):
    if(y[i] == prelist[i]):
        count = count + 1
test_precision = (count/len(x))*100
print (test_precision)

When I use predict_classes on 11200 images that I used to train the neural network and compare its result with true labels and calculated accuracy again its accuracy is 91%.

Gerry P · Answer

One problem could be with the selection of the validation set. For your model to work well on data it has not seen as training data is to have a high validation accuracy, but that is not sufficient on its own. The validation set must be large enough and varied enough that its probability distribution is an accurate representation of the probability distribution of all the images. You could have 1000 validation images, but, if they are similar to each other, they would be an inadequate representation of the probability distribution. Therefore, when you run your trained model to make predictions on the test set its performance would be poor.
So the question is: how many validation images did you use and how were they selected (randomly or handpicked)?
Try increasing the number of validation images and use one of the available methods to randomly select images from the training set, remove them from the training set, and use them as validation images.
Keras's flow_from_directory can achieve that or sklearn's train_test_split. I usually have the validation set be selected randomly and have at least 10% as many images as the test set does.
Overtraining is a possibility, but I think unlikely given your validation accuracy is high.
Another thing is how were the test set images selected? Maybe their distribution is skewed. Again, the best thing to do is to select these randomly.
What was your training accuracy? Without a high training accuracy, the validation accuracy may be a meaningless value. Training accuracy should be in the upper 90's for the validation accuracy to be really meaningful.
Finally, is there any possibility your test images were mislabeled? Try switching your test set as the validation set and see what validation accuracy you get.
Here is an example of what I mean. A guy built a CNN to operate on a set of images separated into two classes. One class as "dogs", the other class was "wolves". He trained the network with great results almost 99.99% training accuracy and 99.6% validation accuracy. When he ran it on the test set, his accuracy was about 50%. Why? Well, it turns out all the images of wolves were taken with snow in the background. None of the images of dogs had snow in the background. So, the neural network figures out if snow must be a wolf, if no snow must be a dog. However, in his test set, he had a mixture of wolves in or not in snow, dogs in or not in snow. Great training and validation results but totally useless performance.

Why am I getting a difference between training accuracy and accuracy calculated with Keras' predict_classes on a subset of the training data?

One Answer

Add your own answers!

Ask a Question