Why we need to have the test set remains consistent across multiple runs?

Question

In the book "hands-on machine learning with scikit-learn and tensorflow: concepts, tools, and techniques to build intelligent systems" , more specifically in chapter 2 , the writer is teaching us how to create a test set. He mention that we need to keep the test set consistent across multiple runs . To do so he mentioned the following :

A common solution is to use each instance’s identifier to decide
whether or not it should go in the test set (assuming instances have a
unique and immutable identifier). For example, you could compute a
hash of each instance’s identifier and put that instance in the test
set if the hash is lower or equal to 20% of the maximum hash value.
This ensures that the test set will remain consistent across multiple
runs, even if you refresh the dataset. The new test set will contain
20% of the new instances, but it will not contain any instance that
was previously in the training set.
My question is why it is important to do so ? why we don't like to generate different test set for each run ?

Deepika Kalra · Accepted Answer

You will judge the performance of the trained model based on certain performance metrics.
If you keep updating your test set, then you will not know whether one run is better than the other.
For e.g. you are trying to predict the value of house. And you have only one data point in your test set and you are using mean absolute error (MAE) as your performance criteria.
Scenario 1: the actual value of house is 100K, your prediction with first model is 101K. MAE = 1000
Scenario 2: the the actual value of house is 25K, your prediction with second model is 26K. MAE = 1000
In reality, prediction is better with model1 as compared to model2. But because you changed your test set, you will not be able to make that conclusion. Had you kept your test set same in scenario2, your prediction would have been much higher making MAE much higher, clearly indicating that model1 is better than model2.
Hope this helps.

Graph4Me Consultant · Answer

The whole setup of training, validation and testing do make sense only if the samples are independently and equally distributed. Have a look at Foundations of machine learning (Definition 2.1 "Generalization error" and Definition 2.2 "Empirical Error").
Thus it is very important that the distribution of the test set remains the same. Otherwise you cannot deduce anything from the results.

Why we need to have the test set remains consistent across multiple runs?

2 Answers

Add your own answers!

Ask a Question