TransWikia.com

Random forest after cross validation

Cross Validated Asked by Steven Niggebrugge on August 1, 2020

i have been wondering for some time now how random forests (or AdaBoost, doesn’t matter) are built when using cross-validation.
Let’s see we’re using 5-fold cross validation to train random forests on 5 different training sets and therefore test on 5 different test sets.
How does the ‘final’ random forest look like when we are basically building 5 random forests (one for each fold of the cross validation). How are these forests combined into a final model?

I have never understood this step and I really hope someone can help me with this!

thanks in advance,
Steven

One Answer

I am not sure why you are using cross-validation with RandomForest. RandomForest does not need cross validation. When you train a RF model, each tree uses bootstrapped samples from original data as train set and leaves about 1/3 of data called out of bag(oob) data. Each oob data not used in training is marked and then used for validation using the forest(data is tested on forest not on a tree). Out of bag data is used for each tree to take vote on and finally we average them to get the final result.

Random Forest does not need cross-validation to avoid overfitting. It uses (bootstrapping + averaging) called as bagging to deal with overfitting.

Answered by Vivek on August 1, 2020

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP