Is it advisable to use a model which is underfit but gives very high accuracy?

Question

I am training a model for a single-label classification task in Vision.  In this training, I am using oversampling of all the classes, and MixUp augmentation, along with rotation and dihedral transformations to augment data.
What happens is, the model, after being trained for 20 epochs, achieves $<8%$ validation loss (CE Loss) and $98%$ accuracy in predicting the labels of the images in the validation set.
The problem is that the model underfits. While the accuracy is extremely high and validation loss is extremely low, the training loss is quite high $approx 75%$.
Should I use this model in production? Although the model underfits the training data, it achieves very high accuracy in predicting the labels in the validation set, and the validation loss is also extremely low.
Should I work with an underfit model in production?
Here's how the last two epochs look like-

epoch
train_loss
valid_loss
accuracy

18
$0.764258$
$0.150605$
$0.963151$

19
$0.763108$
$0.152006$
$0.961245$

In case you might ask what am I doing adding augmentations if the model is underfitting, but if I don't add those augmentations, the model will start to overfit, and validation loss and accuracy will start to be worse.
I am not asking how do I make the underfitting go away, I can do that by running, say, 20 more epochs of training. I am asking if it is okay to use such a model in production.

Nicolas M · Answer

In other words, your model doesn't learn very well on the training data, but depite that, it does good predictions on test data, right?
The short answer is no, because there is a big risk of biased results in production.
The long answer is you have to know whether the test data is representative enough of the production data or not.
Do you also use the same augmentations for the test data?

Ben Reiniger · Answer

More generally, this would be indicative of a problem. In your context, where you're confident that the test set is representative of the intended production setting, and the lower scores on the training set may be due to the augmentation, I think you're probably fine to proceed.
To be a little more confident, I'd want to evaluate the hypothesis that the training scores are low because of the augmentations; can you evaluate the original training set?

Is it advisable to use a model which is underfit but gives very high accuracy?

2 Answers

Add your own answers!

Ask a Question

epoch	train_loss	valid_loss	accuracy
18	$0.764258$	$0.150605$	$0.963151$
19	$0.763108$	$0.152006$	$0.961245$