Hyperparameter searching when there is no development set

Question

I have a train and a test set and no development (dev) set. I'm training a model on the train set and searching for the best hyperparameters that can eventually maximize the accuracy of the test set (pretty much a normal machine learning scenario). Here is my confusion: we usually do the hyperparameter tuning on the development set (not test set) to find the best hyperparameters, then we use those best hyperparameters to train our model and finally test it on the test set. I have two questions though when there is no dev set:

Is it problematic if we do hyperparameter searching on the test set? One may say, it is obviously problematic, but I say the hyperparameter searching is like every time training a model from scratch using a combination of hyperparameters and pick the best combination, and it's not like that the model is learning from previous hyperparameter searches, so is this still problematic?
If the first option is problematic, should I just break my train set into train+dev set and then use the dev for hyperparameter searching?

Ben Reiniger · Answer

This is very nearly a duplicate of Is a test set necessary after cross validation on training set?, but I think it's worth addressing specifically this part of your question:

the hyperparameter searching is like every time training a model from scratch using a combination of hyperparameters and pick the best combination, and it's not like that the model is learning from previous hyperparameter searches

Indeed the various models don't gain any direct information from each other, either from training or the dev set. However, you are choosing the hyperparameters that perform best on the dev set, so the whole pipeline can become "overfit" to the dev set. The size of that effect is most often not too pronounced, but you certainly cannot report the score of the chosen hyperparameters as an unbiased estimate of future performance.
The answer to your second question is then "yes"; and if your training set is small, consider a repeated split cross-validation like k-fold.

Hyperparameter searching when there is no development set

One Answer

Add your own answers!

Ask a Question