What is the use of using width/height shift in data augmentation?

Question

I'm not sure to understand the use of augmentation data using width shift and height shift.
Say I have limited image data, and I want to create new data using Keras' ImageDataGenerator. To classify between images, I use CNNs. Since CNNs are translation invariant, aren't the translational shifts from keras useless as those shifts will not result in new images per say? (I know new images will be created from the generator, but the CNNs will not learn new features from the pictures, and might instead cause overfitting?)

10xAI · Accepted Answer

If done standalone, then it is correct.
But another goal while applying augmentation is to have randomness.
 This is achieved with multiple augmentation techniques applied together.
 In that sense, these two can also become effective.
e.g. This is a zoomed image, adding vertical shift can crop the image further and eventually result in a new(random) image
$hspace{2cm}$

prashant0598 · Answer

Since CNNs are translation invariant, aren't the translational shifts
from keras useless as those shifts will not result in new images per
say?

Invariance to translation means that if we translate the inputs the CNN will still be able to detect the class to which the input belongs. Translational Invariance is a result of the pooling operation. In pooling operation, we replace the output of the convnet at a certain location with a summary statistic of the nearby outputs such a maximum in case of Max Pooling. As in the case of max-pooling, we replace the output with the max, so even though we adjust the input slightly, it will not impact the values of most pooled outputs.
The major issue with max pooling is that the network fails to learn the spatial relation between different features, and thus will give a false positive if all features are present in the wrong position with respect to one another. This happens for Cases like Image Segmentation where we require position.
So, Data Augmentation comes in handy to handle heavy distortion in images and make the model more robust to these distortion when we have less images.
Reference:
Translation Invariance

What is the use of using width/height shift in data augmentation?

2 Answers

Add your own answers!

Ask a Question