TransWikia.com

How to set a class_weight Dictionary for Random Forest?

Data Science Asked by fega_zero on November 15, 2020

I’m dealing with an unbalanced dataset, so I decided to use a weight dictionary for classification.

Documentation says that a weight dict must be defined as shown below:
https://imbalanced-learn.org/stable/generated/imblearn.ensemble.BalancedRandomForestClassifier.html

     weight_dict = [{0: 1, 1: 1}, {0: 1, 1: 5}, {0: 1, 1: 1}, {0: 1, 1: 1}]

So, since I want to predict 12 classes which are located in the last column.
I assume that the setting would be like:

weight_dict = [{0: 1, 1: 5.77390289e-01}, {0: 1, 1: 6.48317326e-01}, 
               {0: 1, 1: 1.35324885e-01}, {0: 1, 1: 2.92665797e+00}, 
               {0: 1, 1: 5.77858906e+01}, {0: 1, 1: 1.73193507e+00},
               {0: 1, 1: 9.27828244e+00}, {0: 1, 1: 1.18766082e+01}, 
               {0: 1, 1: 8.99009985e+01}, {0: 1, 1: 6.39833279e+00}, 
               {0: 1, 1: 2.55347077e+01}, {0: 1, 1: 9.47015372e+02}]

Honestly, I don’t clearly understand the notation of the first indicators, I mean the:

      0:1 of {0: 1, 1: 1} 

or the:

 1: value.

Do they represent column position, label order?

What is the right way to set it?

I’ll be grateful for your insights.

One Answer

If you're just doing multiclass classification, you should specify the weights as a single dictionary, e.g. {0: 1.0, 1: 1.5, 2: 3.2} for a three-class problem. (Or use the convenience modes "balanced" or "balanced_subsample").

The list of dictionaries is used for multilabel classification (where each row can have multiple true labels). In that case, each dictionary is for one of the outputs, the keys being the classes for that output and values the weights.

Correct answer by Ben Reiniger on November 15, 2020

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP