# how does splitting occur at a node in a decision-tree with non-categorical data?

Data Science Asked by A-ar on December 23, 2020

According to a website (:http://dataaspirant.com/2017/01/30/how-decision-tree-algorithm-works/) , these values are chosen randomly:

I don’t think this is the case with any optimized way of creating a decision tree. In this image(different example) the value is 2.45 for the root node:

Was this value chosen randomly like explained in the website? If not and the value is not chosen randomly then how is it calculated?

Values chosen at the node level to split the data are determined to minimize the Gini Impurity index which represents the entropy or the chaos in your data. It chooses the value that separates your classes best.

Take an example :

You have a population of 10 people. Only variable available is the age. You're predicting if the person has a certain disease. After doing some EDA , you notice that between 20-40 , both your classes , again let's assume its binary classification , are similarly present. and Exactly at the age 56 and going up, you get 4 ones and zero. Meanwhile, below the 56, you have equally distributed classes across that range.

You tree when considering which value to split the node on will calculate the Information Gain or Gini Impurity, and then splits your population across 2 leafs, where the entropy is minimized in those 2 leafs as much as possible. Again you will have a leaf with 4 ones a zero and the other will have, lets say, 3 zeros and 2 ones.

In simpler terms, you tree when splitting a population in a node, will try to make the leaves the purest they can be, and by purest, i mean containing only one class. That's how the value is decided.

Hope it makes sense.

Answered by Blenz on December 23, 2020

No, I don't think values according to which the branches are seperated are chosen at random. Instead, weighted average is calculated for each category and the category with the highest weighted average is chosen as the root node. This is also referred as Information gain Consider this dataset

.

Consider the above picture, Here the outlook is chosen as Root node, And how is outlook chosen as root node?

First, We calculate the total entropy of the data. Lets say its 0.95. Now inorder the pick the right root node, We will find weighted averages of all the subcategories. There are 4 four categories here, So we will obtain 4 weighted entropy averages. Lets say they are 0.3, 0.2, 0.4, 0.8. Now we will subtract the induvidual weighted entropy averages from the total entropy. So we will get (0.95-0.3), (0.95-0.2), (0.95-0.4), (0.95-0.8). Among all the three which ever category has the highest value that category will be chosen as the root node. These 4 values are the information gain of each of the categories i.e Whichever category has the highest information gain, we will pick it as the root node. In our case, its the outlook category/feature .Hope it helps

Check this for more clarity

Answered by karthikeyan mg on December 23, 2020

## Related Questions

### How to understand the network structure in this paper( a multiple timeseries fusion model )

0  Asked on December 5, 2020 by mithril

0  Asked on December 5, 2020

### What is the best method for classification of time series data? Should I use LSTM or a different method?

1  Asked on December 4, 2020 by rosy

### How should I sample from a mixture distribution?

1  Asked on December 4, 2020 by josh-purtell

### How to speed up passing of images to a GPU

1  Asked on December 4, 2020 by john-watts

### How to properly train your Self-Organized Map?

1  Asked on December 4, 2020

### Can I conduct independent t-test when data is infested with outliers ? and how to interpret the t-statistics?

2  Asked on December 4, 2020 by pinky

### What is the most effective unsupervised ML algorithm to use when outliers are present in data set?

2  Asked on December 4, 2020 by ross-leavitt

### Augmenting the validation set in Ensemble Model

1  Asked on December 4, 2020

### What is difference between Standard Normal Distribution and Mean Normalization approaches to feature-scaling?

1  Asked on December 4, 2020

### Gradient descent formula implementation in python

1  Asked on December 4, 2020 by manas-tripathi

### Building a Keras text embedding model with cosine proximity

0  Asked on December 4, 2020 by viktor1903

### How do I build an image dataset for CNN?

3  Asked on December 4, 2020 by 55thswiss

### How do I use TF*IDF scores for my machine learning model?

3  Asked on December 4, 2020 by apollo

### Impact of covid19 in forecasting models

0  Asked on December 4, 2020

### Is there any way to read Xlsx file in pyspark?Also want to read strings of column from each columnName

2  Asked on December 4, 2020 by shalu

### Model Validation accuracy stuck at 0.65671 Keras

4  Asked on December 4, 2020 by talha-anwar

### How to reconsider MSE metric when doing target engineering through a logarithm?

1  Asked on December 4, 2020

### Scipy curve_fit and method “dogbox”

1  Asked on December 3, 2020 by zipline86

### What is the difference between model hyperparameters and model parameters?

6  Asked on December 3, 2020 by minerals