TransWikia.com

Different values of mean absolute error when using GridSearchCV for max_leaf_nodes vs manually optimising max_leaf_nodes

Data Science Asked by spectre on August 6, 2021

manually optimizing parameter

using GridSearchCV to optimize parameter

I am trying out hyperparameter tuning vs manually selecting the best parameter (max_leaf_nodes) on a decision tree model with mean absolute error as the scoring. In theory, both should give me the same MAE and max_leaf_nodes; but, both are giving me different MAEs. Also, if I change the value of cv in GridSearchCV I get different results. So basically I have two questions:

  1. Why am I getting different max_leaf_nodes and MAE in both cases?

  2. How do I determine the value of cv in GridsearchCV, because I get different results for cv = 3, cv = 5, and cv = 10?

One Answer

Your manual approach gives the MAE on the test set. Because you've set an integer for the parameter cv, the GridSearchCV is doing k-fold cross-validation (see the parameter description in grid search docs), and so the score .best_score_ is the average MAE on the multiple test folds.

If you really want a single train/test split, you can do that in GridSearchCV, see e.g. this SO post.

Answered by Ben Reiniger on August 6, 2021

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP