TransWikia.com

Why does a class weight fraction improve precision compared to undersampling approach where precision drops?

Cross Validated Asked on November 12, 2021

I have an imbalanced data where the ratio between positive to negative samples is 1:3 (positive samples are 3 times higher than negative). For my case it is is important to have a higher precision (and lower FPR) even if it comes at the cost of low recall (higher FNs). I intend to reach this goal by training a random forest with a class weight of negative class many times more than the positive class and observe that precision increases (lesser FPs) as I increase the weight of negative class and recall drops (more FNs). This was kind of what I expected. See a sample table below:

enter image description here

Next I also try to see if I can also train the model using an intentionally imbalanced data to introduce a bias towards classification of negative class. I try to achieve this by under-sampling my negative class with different fractions. So 1:10 means my negative samples are ten times more than positive in the training phase. What I observe now is that both the precision and recall go lower as I keep decreasing the number of positive samples in the training (hence making the negative class majority). Why is that Precision drops in this case although the FPR is decreasing? Should precision and FPR not be inversely proportional? Thanks

enter image description here

One Answer

The problem with the second approach is that you're heavily undersampling your positive class.
Knowing that your starting ratio of positive:negative is 3:1, it means that in order to get to a 1:1 you are removing exactly 50% of your starting dataset! Once you get to 1:10 you have removed 78% of the starting data... you see the trend here.

In general yes, changing the balance of the classes will result in better classification performance for the majority class. However, if you are also reducing your sample size a lot you will have a generalized decrease in performance, so both your Precision AND Recall will drop.

Finally, you're better off using the first approach, as weighting the classes does not have this type of inconvenients!

Answered by Davide ND on November 12, 2021

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP