Why does my model fail to predict on the whole dataset?

Data Science Asked by Marco Ramos on June 10, 2021

So I have about 3000 images with 6 classes and this is what I did:

1 – split into training set and test set prior to anything with 20% test size

2 – performed data augmentation on the under represented classes in the training set and ended up with 2700 training and 640 test

3 – did feature extraction techniques (haralick, dominant color, avg color, hist, etc) on both sets

4 – did normalization of features using standard scaler (fit_transform on training and after just transform on test)

5 – did a gridsearch with 5 fold cv to find best params just in the training set and got 91% accuracy average

6 – used the best estimator to predict on the test set and got 94% accuracy

7 – pickled the model and scaler and then uploaded on a new file

8 – create a predict function with all the transformations and then feed it a random image from the data set, in theory this is not new data so it should give the same results yet it fails miserably every time

what am I doing wrong?
I don’t think its overfitting otherwise my test accuracy would fail
I presume it’s something to do with the scaler?

accuracy image classification prediction predictive modeling

Add your own answers!

Ask a Question

Get help from others!