TransWikia.com

How to use a dataset with only one category of data

Data Science Asked by Finn Williams on February 8, 2021

I am performing a classification task, to try to detect an object. A picture of the environment is taken, candidates are generated of this possible object using vision algorithms, and once isolated, these candidates will be passed through a CNN for the final decision on whether the object has been detected or not. I am attempting to use transfer learning on InceptionV3 but am having difficulty training it, as I only have one set/class of images.

The dilemma is that I only have one class of data and when I pass it through the network, I get a 100% accuracy (because there is nothing to compare it to). How should I overcome this? Should I find more categories online to add to my dataset? What should these categories be?

Just to clarify, as an example, I have class "cat".

Not "cat" and "dog".

Not "cat" and "no cat".

Just "cat". That is what my dataset consists of at the moment.

3 Answers

The model can't be trained if all it has ever known is just one thing. There must be examples of "not cat" for it to learn.

To fix this you can find anything to train it on as long as it is "not cat" and beef up your training dataset. The training set should be a mix of "cat" and "not cat". Once you do then you'll start the process of refining your model's accuracy.

Answered by Noel on February 8, 2021

Short answer

It seems that you're looking for One-Class classification approaches. There are several approches, such as isolation forest, one class SVM, recontruction error of autoencoder (trained only by your positive class), and so on... All those classifiers learn from one class.

EDIT below

About creating a "no cat" class

Must to know: when commonly training classifiers to distinguish cat from no-cat, you should interpret their predictions as follow:

If it says it is a cat, that means that it looks more similar to a cat than to a no cat. Nothing more.

If one day your classifier sees an input that it has never seen in your "no cat" training dataset, it could choose that it looks more similar to a cat.

Conclusion: Be careful/aware when creating "no cat" class.

A first understanding of one-class classification

The objective of one class classification is not to differentiate multiple classes anymore but to find the best descriptive boundaries of your single class.

One easy-to-understand example with a distance approach:

  1. Take some features that represent your one-class input data.
  2. In this features space, compute the maximum distance $d_{max}$ between 2 nearest neigbors.
  3. Project any new input in this features space and compute its distance from its nearest neighbor.
  4. If this distance is more than $d_{max}$, it is not your class. Otherwise, it is.

Of course this is a primary example but it might give you an idea of what one-class classification does.

One difficulty of one-class classification is to find the right set of features. To go further from this example, anything that bounds a cluster (such as some clustering algorithms) could be used to create a one-class classifier.

Going further

One-class classification problems has draw more and more attention in recent years. You could have a look at those articles:

Answered by etiennedm on February 8, 2021

The Model learns to match the weights as per the image and feedback from label data.

If you will feed a few Image classes as "Not Cat", it will learn to classify similar features as "Not Cat". But might fail for a new Class.
e.g. if it is trained on "Car/Furniture/Dog" as "Not Cat", then chances are high that a Wild Cat will be classified as Cat.
Dumping all the Imagenet dataset will definitely provide quite a good variance to the "Not Cat" class and may work most of the time but that is not the appropriate solution for the problem.

Such type of problem will fall under One-Class-Classification.
Core idea is to use CNN to extract features then use some specialized models e.g. one-class SVM, Gaussian Mixtures, etc. to define a boundary for "Cat"

This problem, as defined by the one-class SVM approach, consists of identifying a sphere enclosing all (or the most) of the data. The classical strategy to solve the problem considers a simultaneous estimation of both the center and the radius of the sphere.

You may start with these links(In the specified order) -
Hackernoon blog
Arxiv
Researchgate

There are other approaches too i.e. based on Auto-encoder. Here, we try to put a Threshold on reconstruction error.
References-
Quora
SO
Keras blog

Also, may look here to check an idea which generated Random images for "No Cat" Class.
Here

Answered by 10xAI on February 8, 2021

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP