Random forest after cross validation

Question

i have been wondering for some time now how random forests (or AdaBoost, doesn't matter) are built when using cross-validation.
Let's see we're using 5-fold cross validation to train random forests on 5 different training sets and therefore test on 5 different test sets.
How does the 'final' random forest look like when we are basically building 5 random forests (one for each fold of the cross validation). How are these forests combined into a final model?
I have never understood this step and I really hope someone can help me with this!
thanks in advance,
Steven

Vivek · Answer

I am not sure why you are using cross-validation with RandomForest. RandomForest does not need cross validation. When you train a RF model, each tree uses bootstrapped samples from original data as train set and leaves about 1/3 of data called out of bag(oob) data. Each oob data not used in training is marked and then used for validation using the forest(data is tested on forest not on a tree). Out of bag data is used for each tree to take vote on and finally we average them to get the final result.
Random Forest does not need cross-validation to avoid overfitting. It uses (bootstrapping + averaging) called as bagging to deal with overfitting.

Random forest after cross validation

One Answer

Add your own answers!

Ask a Question