TransWikia.com

Multi-class classification with prior knowledge of class similarity?

Cross Validated Asked on November 2, 2021

Backrounds

I would like to build a model that predicts a month label $mathbf{y}$ from a given set of features $mathbf{X}$. Data structure is as follows.

  • $mathbf{X} : N_{samples} times N_{features}$.
  • $mathbf{y}: N_{samples} times 1$, which has range of $1,2,cdots,12$.

I may find it more helpful to have output as predicted probability of each labels, since I would like to make use of the prediction uncertainty. I may try any multi-class algorithms to build such model. Actually, I tried some of scikit-learn’s multiclass algorithms.

However, I found out that none of them very useful, due to the following problem that I face.


Problem : I cannot make use of class similarity

By class similarity, I mean the similar characteristics that temporally adjacent months generally share. Most algorithms do not provide any ways to make use of such prior knowledge. In other words, they miss the following requirements:

It is quite okay to predict January(1) for February(2), but very undesirable to predict August(8) for February(2)

For instance, I may try multi-layer perceptron classifier(MLP) to build a model. However, algorithms such as MLP are optimized for problems such as classification of hand-written digits. In these problems, predicting 1 for 2 is equally undesirable to predicting 8 for 2.

In other words, most algorithms are agnostic to the relative similarity across labels. If a classifier could exploit such class similarity as a prior knowledge, it may perform much better. If I were to force such prior in the form of distribution, I may choose cosine-shaped distribution across months.

Some may suggest some algorithms that are based on linear regression, such as all-or-rest logistic regression. However, since months have wrap-around properties, such regression models may not work well. For instance, assuming $mathbf{y}$ as continuous variable may miss that January(1) and December(12) are actually very similar.


Questions

As a beginner to machine learning, I am not very familiar with available algorithms. Any help, including ideas about my problem or recommendations of related papers, threads, or websites, will be welcomed.

One Answer

Let me attempt a partial answer, given the insight of @Eweler's comment.

  1. What you want is to treat this problem as a regression and not classification. Regression will capture well some of your intuition on "class similarity" (I would not call that). Predicting 9 instead of the correct 10 is better than predicting a 3 instead of the correct 7. In the first the error is 1 and in the second it is 4. As you correctly understand, if you treat this as a classification, both errors would "count the same". As a regression, they do not.

  2. Just treating as a regression does not deal with the problem of wrap-around. Here I suggest reading the answer to this question in CV. Basically, as you had an intuition - but not on a single output, you want to predict 2 outputs, one that is x=cos(2*pi*month/12) and one that is y=sin(2*pi*month/12).

  3. But there may be a problem, for which I do not have the answer (and thus the partialness of my answer). I am not sure if predicting the two outputs independently does the right thing in terms of the intuitive cost of errors between predicted and correct outputs. Your prediction will be a point in a plane where the correct value are one of the 12 points equally spaced in the unit circle. I am not sure that errors in this plane are a good match to your intuitions regarding errors in predicting month even if you consider that your predictions will be close to the unit circle - which I don't think you can because the x and y predictions are independent.

  4. Regardless of whether you decide to use this prediction of two outputs in order to capture the wrap-around effect (which may compromise your intuitions regarding the cost of errors) or you decide to keep it a simple regression of numbers from 1 to 12 (which does not capture the wrap around), there is the lesser problem of transforming a real/floating point output (or numbers) output into the appropriate integer that represents a month. I think that mapping the real output to the closest integer or to the closest "month-point" in the unit circle will work. But I am not 100% sure. There is the concept of ordinal regression that predict integers and not real/floating point numbers but I am not sure it will be worth your effort to delve on this topic.

Answered by Jacques Wainer on November 2, 2021

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP