TransWikia.com

Grouping high dimension Y-space to lower dimensions

Data Science Asked by prog_guy on February 25, 2021

I have a ML problem with 300 variables to predict, Meaning I have a multi-label binary classification problem with a Y space of 300 and with about 2000 rows only.
Thus we assume that there are 2^300 permutations. However, it is not so.

Let me explain with an example.
X1,….X400 | Y1,Y2,Y3………..Y299,Y300
‘healthy’,..’sad’| 0,1..1,0
‘unhealthy’…’sad’|1,0,….0

However when i aggregate across the rows,there are maybe about 100 combinations far less than 2^300 combinations such as this.
so Y1,Y2,Y100 => C1 OR Y33,Y44,Y291,Y299,Y300 =>C2 … till C100
Thus, I can change it to a multi-class problem to predict these 100 classes instead.
However, there are a lot of classes with count of 1 or 2. More than 50%. Thus, this approach doesnt work so well.

Is there an algorithmic way to still do the multi-label but by grouping the variables in the most optimal way like from Y space to Z space with dimension 5 for example.
{Y1,Y10,Y222,Y232}=> Z1 can be grouped together
{Y2…} => Z2 can be grouped together
Thus,we can predict on the Z space instead. At this point, I am not sure whether the Y variables need to be exclusive to one Z group or not.

But I am open to all suggestions at this moment ..

Cheers

One Answer

You may find this resource helpful : https://xang1234.github.io/multi-label/

One possibility is to cluster similar labels together so that they are processed together by the multilabel classification algorithms. Community detection methods such as the Louvain algorithm allow us to cluster the label graph. This is implemented in the NetworkXLabelGraphClusterer with the parameter methods = louvain.

A resource for splitting your data into (hopefully) balanced train,val,test sets (not so easy because you are in a multi label setting) : http://scikit.ml/stratification.html

Answered by mprouveur on February 25, 2021

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP