Random forest after cross validation

Cross Validated Asked by Steven Niggebrugge on August 1, 2020

i have been wondering for some time now how random forests (or AdaBoost, doesn’t matter) are built when using cross-validation.
Let’s see we’re using 5-fold cross validation to train random forests on 5 different training sets and therefore test on 5 different test sets.
How does the ‘final’ random forest look like when we are basically building 5 random forests (one for each fold of the cross validation). How are these forests combined into a final model?

I have never understood this step and I really hope someone can help me with this!

thanks in advance,

One Answer

I am not sure why you are using cross-validation with RandomForest. RandomForest does not need cross validation. When you train a RF model, each tree uses bootstrapped samples from original data as train set and leaves about 1/3 of data called out of bag(oob) data. Each oob data not used in training is marked and then used for validation using the forest(data is tested on forest not on a tree). Out of bag data is used for each tree to take vote on and finally we average them to get the final result.

Random Forest does not need cross-validation to avoid overfitting. It uses (bootstrapping + averaging) called as bagging to deal with overfitting.

Answered by Vivek on August 1, 2020

Add your own answers!

Related Questions

How to remove correlated features?

1  Asked on December 27, 2021 by ichait


Ask a Question

Get help from others!

© 2022 All rights reserved. Sites we Love: PCI Database, MenuIva, UKBizDB, Menu Kuliner, Sharing RPP, SolveDir