TransWikia.com

Efficient Decision Tree Pruning

Data Science Asked on July 14, 2021

Is there an efficient way to handle pruning in Decision Tree with Python ?

Currently I’m doing that:

def do_best_tree(Xtrain, ytrain, Xtest, ytest):
    clf = DecisionTreeClassifier()
    clf.fit(Xtrain, ytrain)
    path = clf.cost_complexity_pruning_path(Xtrain, ytrain)
    ccp_alphas = path.ccp_alphas
    clfs = []
    for ccp_alpha in tqdm(ccp_alphas):
        clf = DecisionTreeClassifier(ccp_alpha=ccp_alpha)
        clf.fit(Xtrain, ytrain)
        clfs.append(clf)
    return max(clfs, key=lambda x:x.score(Xtest, ytest))

But it’s super slow (as I create and fit a lot of trees).

Is there a more efficient way to do this with scikit-learn, or another library that handle this ?

One Answer

You might benefit from random forests instead which aim to achieve the same objectives you are aiming for, i.e better generalization through pruning to remove overfitting.

scikit learn's random forest algorithm will let you specify how many or what proportion of variables you want to automatically drop across the many trees whose results will be averaged for even better generalization performance.

Answered by Nitin on July 14, 2021

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP