What are the consequences of not freezing layers in transfer learning?

Question

I am trying to fine tune some code from a Kaggle kernel. The model uses pretrained VGG16 weights (via 'imagenet') for transfer learning. However, I notice there is no layer freezing of layers as is recommended in a keras blog. One approach would be to freeze the all of the VGG16 layers and use only the last 4 layers in the code during compilation, for example:

for layer in model.layers[:-5]:
    layer.trainable = False

Supposedly, this will use the imagenet weights for the top layers and train only the last 5 layers. What are the consequences of not freezing the VGG16 layers?

from keras.models import Sequential, Model, load_model
from keras import applications
from keras import optimizers
from keras.layers import Dropout, Flatten, Dense

img_rows, img_cols, img_channel = 224, 224, 3

base_model = applications.VGG16(weights='imagenet', include_top=False, input_shape=(img_rows, img_cols, img_channel))

add_model = Sequential()
add_model.add(Flatten(input_shape=base_model.output_shape[1:]))
add_model.add(Dense(256, activation='relu'))
add_model.add(Dense(1, activation='sigmoid'))

model = Model(inputs=base_model.input, outputs=add_model(base_model.output))
model.compile(loss='binary_crossentropy', optimizer=optimizers.SGD(lr=1e-4, momentum=0.9),
              metrics=['accuracy'])

model.summary()

David Masip · Accepted Answer

I think that the main consequences are the following:

Computation time: If you freeze all the layers but the last 5 ones, you only need to backpropagate the gradient and update the weights of the last 5 layers. In contrast to backpropagating and updating the weights all the layers of the network, this means a huge decrease in computation time. For this reason, if you unfreeze all the network, this will allow you to see the data fewer epochs than if you were to update only the last layers weights'.
Accuracy: Of course, by not updating the weights of most of the network your are only optimizing in a subset of the feature space. If your dataset is similar to any subset of the imagenet dataset, this should not matter a lot, but, if it is very different from imagenet, then freezing will mean a decrease in accuracy. If you have enough computation time, unfreezing everything will allow you to optimize in the whole feature space, allowing you to find better optima.

To wrap up, I think that the main point is to check if your images are comparable to the ones in imagenet. In this case, I would not unfreeze many layers. Otherwise, unfreeze everything but get ready to wait for a long training time.

vivek · Answer

The reason it can save computation time is because your network would already be able to extract generic features from your dataset. The network will not have to learn extracting generic features from scratch.

A neural network works by abstracting and transforming information in steps.
In the initial layers, the features extracted are pretty generic, and independent of the particular task. It is the later layers which are much more tuned specific to the particular task. So by freezing the initial stages, you get a network which can already extract meaningful general features. You would unfreeze the last few stages(or just the new untrained layers), which would be tuned for your particular task.

Also, I would not recommend unfreezing all layers if you have any new/untrained layers in your model. These untrained layers will have large gradients in the first few epocs, and your model will train as if initialized by random(and not pre-trained) weights.

Yacine Rouizi · Answer

The result of not freezing the pretrained layers will be to destroy the information they contain during future training rounds.
See Transfer learning and fine-tuning guide from TensorFlow.

What are the consequences of not freezing layers in transfer learning?

3 Answers

Add your own answers!

Ask a Question