TransWikia.com

Confusion in applying k-fold cross validation to dataset

Data Science Asked by Mr. NLP on December 16, 2020

I have a data set which is already divided into 10 folds with each fold having training,validation and test sets. I’m not able to understand how to apply 10 fold cross validation on this data set.

In general, if we want to apply k-fold cross validation on a data set, the procedure is as follows

enter image description here

In my case, the data set is already divided into 10 folds and each fold contains validation and test sets in addition to training set. It would be helpful if someone can guide me, how to 10 fold cross validation for this kind of data set.

One Answer

In 10 fold cross-validation, you split your dataset into 10 sections, 9 of them are for train and one for test set (there is no validation set), for example, if your dataset is 100 samples, inside a loop, in the first fold (first loop iter), the model train on 90 samples and the rest 10 are for testing the model, and loop is continued until all the dataset is used for training and testing.

for more, see here

and in python, you can implement 10 fold cross-validation using sklearn library here

Now, because your dataset is already split into 10 fold, you have two choices:

1- The easiest way is to combine your dataset into one set then using a specific library to do the 10 fold cross validation for you.

2- write code by yourself to loop over your 10 fold data, in the first iter use the first section for testing and the rest 9 for the training, in the second iter, use the second section for testing, and the first and other 8 sections for training, the loop should continue 10 times until all the data is used for training and testing.

this is the idea behind 10 fold cross validation if this not applicable for your dataset, I think 10 fold is not good in your case.

Answered by Hunar on December 16, 2020

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP