TransWikia.com

Why we need to have the test set remains consistent across multiple runs?

Data Science Asked on July 30, 2021

In the book "hands-on machine learning with scikit-learn and tensorflow: concepts, tools, and techniques to build intelligent systems" , more specifically in chapter 2 , the writer is teaching us how to create a test set. He mention that we need to keep the test set consistent across multiple runs . To do so he mentioned the following :

A common solution is to use each instance’s identifier to decide
whether or not it should go in the test set (assuming instances have a
unique and immutable identifier). For example, you could compute a
hash of each instance’s identifier and put that instance in the test
set if the hash is lower or equal to 20% of the maximum hash value.
This ensures that the test set will remain consistent across multiple
runs, even if you refresh the dataset. The new test set will contain
20% of the new instances, but it will not contain any instance that
was previously in the training set.
My question is why it is important to do so ? why we don’t like to generate different test set for each run ?

2 Answers

You will judge the performance of the trained model based on certain performance metrics. If you keep updating your test set, then you will not know whether one run is better than the other.

For e.g. you are trying to predict the value of house. And you have only one data point in your test set and you are using mean absolute error (MAE) as your performance criteria.

Scenario 1: the actual value of house is 100K, your prediction with first model is 101K. MAE = 1000

Scenario 2: the the actual value of house is 25K, your prediction with second model is 26K. MAE = 1000

In reality, prediction is better with model1 as compared to model2. But because you changed your test set, you will not be able to make that conclusion. Had you kept your test set same in scenario2, your prediction would have been much higher making MAE much higher, clearly indicating that model1 is better than model2.

Hope this helps.

Correct answer by Deepika Kalra on July 30, 2021

The whole setup of training, validation and testing do make sense only if the samples are independently and equally distributed. Have a look at Foundations of machine learning (Definition 2.1 "Generalization error" and Definition 2.2 "Empirical Error").

Thus it is very important that the distribution of the test set remains the same. Otherwise you cannot deduce anything from the results.

Answered by Graph4Me Consultant on July 30, 2021

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP