How to approach semi-supervised binary classification problem with few labels only from one class?

Question

I confront with a binary classification problem where I do have a few instances with labels (so far this is "semi-supervised" learning as far as I know), but only from the positive class. So I cannot take any negative examples as basis for learning the other class. What is the best practice here? I assume that I should find some examples farthest from the explicit positives and treat these like negatives; but if so, what is a handy way for this in Python (preferably in sklearn)?
Furthermore, following the approach above, I'm a bit confused when to switch to supervised mode (if any time at all) if instances could be separated only with clustering?

etiennedm · Answer

I see two approaches:

either you do all your process using only the positive class, so based on one class classification approaches such as isolation forest, one class SVM, recontruction error of autoencoder (trained only by your positive class), and so on... All those classifiers learn from one class.

or you can do it in 2 steps. First try unsupervised approaches (such as clustering) to extract negative (and other positive) class samples. Then use a binary classifier on positive vs negative class. In this approach, you want to be careful with the unsupervised split that give you the input dataset of your second step. You want to avoid mislabeling.

How to approach semi-supervised binary classification problem with few labels only from one class?

One Answer

Add your own answers!

Ask a Question