TransWikia.com

Randomforest parameter tuning R(caret) and Python(scikit learn)?

Data Science Asked on August 6, 2020

Question: is it possible, or even necessary, to perform a cross-validation check to tune the parameters of a Python randomforest implementation (eg scikit learn) when training a new model, as can be done in R’s caret?

Background R: When using R’s caret’s randomforest library, one can tune the parameters by performing a n-fold cross validation, e.g.

train_control <- trainControl(method = "repeatedcv", 
                          number = 10,
                          repeats = 5,
                          verboseIter = TRUE,
                          allowParallel = TRUE,
                          summaryFunction = multiClassSummary)
rf1 <- train(Class ~ .,
         data = train_transformed,
         method = "rf",
         metric = "Accuracy",
         tuneGrid = my_grid1,
         trControl = train_control)

This outputs –

## Aggregating results
## Selecting tuning parameters
## Fitting mtry = 1 on full training set

or whatever mtry is the best for a dataset.

Background Python: Using scikit learn, one can instantiate a randomforest regressor and can perform a cross-validation check on that regressor

RFRegModel = RandomForestRegressor(random_state=42)
cv = cross_validate(RFRegModel,X,y,cv=5,verbose=1)
print(cv['test_score'])

the difference here is that the cross validation doesn’t appear to influence the tuning of the randomforest parameters, as I think it does in R. I think all that is happening here is that we are taking random subsamples of the training data set and testing the implementation of the

Question again: is it possible to train the model in python in a similar manner as is done in caret, i.e. by forcing an n-folds cross-validation
parameter tuning? or, is this even necessary? Is the caret implementation overconstrained by it’s methodolgy?

One Answer

I guess what you are looking for is sklearn.model_selection.GridSearchCV() or a similar function regarding the type of "search" (grid, random, ...) you would like to conduct. This function does hyperparameter tuning and uses cross validation when doing so and the cv parameter will allow you to specify the number of folds.

Answered by haci on August 6, 2020

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP