Data Science Asked by Tarun Pratap on December 24, 2020
In data pre-processing, stratified shuffle is used to ensure that the distribution of the original dataset is reflected in the training, test and validation dataset.
Mini-batch gradient descent uses random shuffling to ensure randomness in the mini-batches.
My doubt is- Why should we implement stratified shuffle on our dataset if it is going to be shuffled in a random manner later during training?
It doesn't, the workflow when training a model is like that:
If we skip the stratified shuffling in step 1 the classes of the train set, validation set and test set wont be evenly distributed.
If we skip the shuffling before each epoch in step 3 the mini-batches in each epoch will be the same.
The proportions of the train set, validation set and test set can of course vary.
Correct answer by Tim von Känel on December 24, 2020
0 Asked on February 22, 2021 by carlos-vzquez-losada
0 Asked on February 22, 2021
1 Asked on February 22, 2021 by 0009
1 Asked on February 22, 2021
1 Asked on February 22, 2021 by zikry-kickbuttowski
1 Asked on February 22, 2021 by dshero
0 Asked on February 22, 2021
classification data cleaning data mining dataset feature engineering
1 Asked on February 21, 2021
gan generative models machine learning supervised learning unsupervised learning
1 Asked on February 21, 2021
classification deep learning machine learning neural network regression
3 Asked on February 21, 2021 by william-falcon
1 Asked on February 21, 2021
12 Asked on February 21, 2021 by tejaskhot
1 Asked on February 21, 2021 by tohid-mon
1 Asked on February 21, 2021 by vishal-kumar-sahu
0 Asked on February 20, 2021 by hamza-boulahia
Get help from others!
Recent Answers
Recent Questions
© 2023 AnswerBun.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP