Overview of the main methods to prune decision trees

Question

Could someone explain the main pruning techniques for decision trees. So something like the 3 most common techniques with a short explanation of how they work.
I have looked online but this, surprisingly, doesnt seem to have been covered anywhere. A canonical answer for this I think would be good.

KirkD_CO · Answer

Before Random Forests and other Decision Tree ensemble methods became common, single decision trees were often over-grown, or grown to maximum depth, and then pruned back based on different criteria.  As far as I'm aware, there are two main approaches.
Reduced error pruning is done by fusing two leaves together at their parent node if the fusion does not change the prediction outcome.
Cost-complexity pruning removes subtrees based upon a cost-complexity function that balances error rate and complexity of the tree.  (You might think of this as a sort of regularization.)  One method of cost-complexity pruning is Minimum Description Length which is an information theoretic cost function that determines the number of bits necessary to encode the decision tree plus the number of bits necessary to encode the errors for that tree.  This method was used by J. Ross Quinlan in C4.5.
You can find a brief description of Decision Tree Pruning along with some additional references, here.  If you do a Google search for "decision tree pruning", you will find many references that discuss the need for it and with a bit of digging you can find more technical, methodological explanations as well.

Overview of the main methods to prune decision trees

One Answer

Add your own answers!

Ask a Question