Expected behaviour of loss and accuracy when using data augmentation

Question

I have implemented a convolutional neural network in Keras, and I use off-line data augmentation in the training set. The way I do this is that I create batches of training data in separate files (because they are not generated in Python), I load one batch/file at a time and run a single epoch of training on it, so during training the exact same data is never seen twice:

for current_batch in range(1, number_of_batches + 1):
    # Load new (X_training, Y_training) pair.

# Fit for a single epoch.
    model.fit(X_training, Y_training, epochs = 1, batch_size = 32, validation_data = (X_validation, Y_validation))

My understanding is that this is what Keras would do internally if I used it's native (image) data augmentation tools, namely, for each epoch, generate a brand new set of data. I don't use validation split as can be seen above, I rather supply (non-augmented) validation data manually, which is currently the same for each training batch.

My question is this: What is the expected behaviour of (binary cross-entropy) loss and accuracy as a function of the number of batches, in a setting like this? If I performed training for $N >> 1$ epochs on the same training set, there would be a chance of obtaining a very small training loss, whereas in this case, my guess is that I cannot expect that? Indeed, from what I've seen so far, the training loss and accuracy are a bit noisy and how to use stopping criteria etc seems not so easy.

Expected behaviour of loss and accuracy when using data augmentation

Add your own answers!

Ask a Question