# Not able to interpret decision tree when using class_weights

Data Science Asked by rahul on October 29, 2020

I’m working with an imbalanced dataset. I’m using a decision tree (scikit-learn) to build a model.
For explaining my problem I’ve taken iris dataset.

When I’m setting class_weight=None, I understood how the tree is assigning the probability scores when I use predict_proba.
When I’m setting class_weight='balanced', I know its using target value to calculate class weights but I’m not able to understand how the tree is assigning the probability scores.

import sklearn.datasets as datasets
import pandas as pd
import numpy as np
from sklearn.tree import DecisionTreeClassifier
from sklearn.model_selection import train_test_split

from sklearn.externals.six import StringIO
from IPython.display import Image
from sklearn.tree import export_graphviz
import pydotplus

df=pd.DataFrame(iris.data, columns=iris.feature_names)
y=iris.target

X_train, X_test, y_train, y_test = train_test_split(df, y, test_size=0.33, random_state=1)

# class_weight=None
dtree=DecisionTreeClassifier(max_depth=2)
dtree.fit(X_train,y_train)

dot_data = StringIO()
export_graphviz(dtree, out_file=dot_data, filled=True, rounded=True, special_characters=True, feature_names=X_train.columns)
graph = pydotplus.graph_from_dot_data(dot_data.getvalue())
Image(graph.create_png()) # I use jupyter-notebook for visualizing the image



# printing unique probabilities in each class
probas = dtree.predict_proba(X_train)
print(np.unique(probas[:,0]))
print(np.unique(probas[:,1]))
print(np.unique(probas[:,2]))

# ratio for calculating probabilities
print(0/33, 0/34, 33/33)
print(0/33, 1/34, 30/33)
print(0/33, 3/33, 33/34)


The probabilities assigned by the tree and my ratios (determined by looking at tree image) are matching.

When I use the option class_weights='balanced'. I get the below tree.

# class_weight='balanced'
dtree_balanced=DecisionTreeClassifier(max_depth=2, class_weight='balanced')
dtree_balanced.fit(X_train,y_train)

dot_data = StringIO()
export_graphviz(dtree_balanced, out_file=dot_data,filled=True, rounded=True, special_characters=True, feature_names=X_train.columns)
graph = pydotplus.graph_from_dot_data(dot_data.getvalue())
Image(graph.create_png())


I’m printing unique probabilities using below code

probas = dtree_balanced.predict_proba(X_train)
print(np.unique(probas[:,0]))
print(np.unique(probas[:,1]))
print(np.unique(probas[:,2]))


I’m not able to understand (come-up with a formula) how the tree is assigning these probabilities.

We should consider two points. First, class_weight='balanced' does not change the actual number of samples in a class, only the weight of class $$w_{c_i}$$ is changed. Second, the [un-normalized] probability of class $$c_i$$ in each node is calculated as

$$w_{c_i}$$ x (number of samples from $$c_i$$ in that node / size of $$c_i$$)

For example, in balanced mode, the [un-normalized] probability of $$c_3$$ in the green leaf is calculated as

$$33.bar{3}% times (3 / 36) ≈ 2.778%$$

compared to $$36% times (3 / 36) = 3%$$ in unbalanced mode.

The probability (normalized) in balanced mode would be:

$$100 times 2.778/(2.778+32.258) % = 7.9289%$$

Remark. The word "probability" is not applicable to each isolated node except for the root node. This is the un-normalized version of the probability used to classify a data point inside a leaf, though the normalization is not required for comparison. However, the notion is applicable to the aggregate of nodes at the same level and the leaves from upper levels (i.e. set of all samples).

Correct answer by Esmailian on October 29, 2020

## Related Questions

### Idenitity between TD(0) algorithm and Policy Evaluation in Dynamic Programming when alpha is equal to 1

2  Asked on February 26, 2021

### predict multiple letters in pixels matrix

0  Asked on February 26, 2021 by jxpython

### GPU memory requirement calculation from network architecture

1  Asked on February 26, 2021 by omroth

### Relationship Between Independent and Dependent variables

1  Asked on February 26, 2021 by vishesh-kumar

### A/B testing with non-Gaussian distributions

3  Asked on February 26, 2021 by anishtain4

### Keras models break when I add batch normalization

2  Asked on February 26, 2021 by axon

### Region Proposal Network – How to subsampling 256 fg/bg anchors

0  Asked on February 26, 2021 by ivan-phillips

### Scikit-learn SelectKBest is picking up obviously unwanted Features

1  Asked on February 25, 2021

### What is the suggested way to create features (Mel-Spectograms) from speech signal for classification with ResNet?

0  Asked on February 25, 2021 by 3r1c

### What is the difference between BERT architecture and vanilla Transformer architecture

1  Asked on February 25, 2021 by luong-minh-tam

### Help why to apply PCA here

0  Asked on February 25, 2021

### A good way to use facial landmark as model input

0  Asked on February 25, 2021 by hyelin

### Holdout vs K-fold

1  Asked on February 25, 2021 by fazla-rabbi-mashrur

### Is my feature normalization correct?

0  Asked on February 25, 2021 by mathematical-inutition

### Create a model that can extract only specific data out of receipts or invoices?

0  Asked on February 25, 2021

### Examples of using GANs to sort numbers?

0  Asked on February 25, 2021 by comp_sci5050

### What is the difference between GPT blocks and Transformer Decoder blocks?

1  Asked on February 25, 2021

### Grouping high dimension Y-space to lower dimensions

1  Asked on February 25, 2021 by prog_guy

### What is the difference between using numpy array images and using images files in deep learning?

2  Asked on February 24, 2021

### Maximum Likelihood estimation

2  Asked on February 24, 2021 by mahajna