Confusion with using different classes in neural networks (training vs testing)

Question

I am new to deep learning and I am confused about having a neural network trained on certain classes and tested on different ones. Suppose I want to have a convolutional neural network that learns authorship attribution (Identifying the author of a certain text). For example, this model is trained on 10,000 authors/classes over hundreds of thousands of texts. How does the model generalize what it learnt to predict the author of a new author/class in a test set? Will it not be trained to identify the authors it was initially trained on and not the new one? For example, a CNN that learns whether an image is a cat or a dog is trained as such and when testing it predicts the same classes.

Donald S · Answer

You are correct, most models in general don't perform well when asked to predict something they haven't seen before. Normally when you are splitting the data into Train and Test datasets, you want to include all examples in each dataset. In the train_test_split library, stratify is used to do this for you.
In the real world, sometimes the new class is not available during training, such as if you are predicting future data that may have a new label. When this occurs, typically your model won't categorize that correctly until you have built up several examples of that data and then rebuilt your model including these new examples in your dataset.
In the meantime, you can try to create a category named "other" and train your model to use this category when it can't categorize into a known class.

10xAI · Answer

It will not
In a very simple language, the model learns the characteristics in terms of the feature and map to result that you provide as a class. That mapping can be very simple or very complex(a big neural network)
What ever data it will get, it will divide the full space and everything will be mapped to a Class.e.g. >10000 will be mapped to Class-A depending on the data, so it will predict Class-A even for 1Mn unless you train with new data for >10k-20K-Class-a, 20k-30k-Class-b etc.
In your case, it will predict the Author which is nearest to the new Author in terms of the type of work.
 Using "Other as a Class" will only work when both the data and class are new.
e.g. If the model sees the data is matching 80* with Author-A, it will predict Author-A with 0.8 confidence and you will have to accept it but in real-life the author is different but the model didn't have distinct data during training.
"Other" can help if the data is for a new type of Author and model is not much confident for any of the Author.
On a lighter note - My 3 years old kid was very confident in identifying a Kingfisher's image as a Parrot. This was his first interaction with Kingfisher.

Confusion with using different classes in neural networks (training vs testing)

2 Answers

Add your own answers!

Ask a Question