TransWikia.com

How max_features parameter works in DecisionTreeClassifier?

Data Science Asked by James Flash on January 2, 2021

What is the parameter max_features in DecisionTreeClassifier responsible for?

I thought it defines the number of features the tree uses to generate its nodes. But in spite of the different values of this parameter (n = 1 and 2), my tree employs both features that I have. What changes so?

max_features = 2

enter image description here

max_features = 1

enter image description here

You can see x1 and x2 are used in both cases

One Answer

Max_feature is the number of features to consider each time to make the split decision. Let us say the dimension of your data is 50 and the max_feature is 10, each time you need to find the split, you randomly select 10 features and use them to decide which one of the 10 is the best feature to use. When you go to the next node you will select randomly another 10 and so on.

This mechanism is used to control overfitting. In fact, it is similar to the technique used in random forest, except in random forest we start with sampling also from the data and we generate multiple trees.

So even if you set the number to 10, if you go deep you will end up using all the features, but each time you limit the set to 10.

If you compare the definition of the max feature in the decision tree and random forest, you will see that they are the same.

https://scikit-learn.org/stable/modules/generated/sklearn.tree.DecisionTreeClassifier.html

https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.RandomForestClassifier.html

Correct answer by Bashar Haddad on January 2, 2021

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP