TransWikia.com

Over fitting in Transfer Learning with small dataset

Data Science Asked by deepguy on May 11, 2021

I am using Transfer Learning to perform image classification.

Base model used : Resnet50 using ImageNet dataset
class_1 and class_2 are the classes each having 1000 samples each (small dataset). And the dataset is not similar to ImageNet dataset.
Number of FC layers used here are 3 with [1024, 512, 256].
I have used a drop out of 0.5 to reduce over-fitting.

When I trained the model with 100 epochs, I could clearly see the model over-fits with training accuracy of 0.9985 and testing accuracy of 0.875.

Is the number of FC layers used is too many which is causing this over-fit problem?
How can I make the model more generalised?

The code used is as given below:

from keras.applications.resnet50 import ResNet50, preprocess_input
from keras.preprocessing.image import ImageDataGenerator
from keras.layers import Dense, Activation, Flatten, Dropout
from keras.models import Sequential, Model 
from keras.optimizers import SGD, Adam
from keras.callbacks import TensorBoard
import keras
import matplotlib.pyplot as plt

HEIGHT = 300
WIDTH = 300
TRAIN_DIR = "/home/ubuntu/dataset/training_set/"
TEST_DIR = "/home/ubuntu/dataset/test_set/"
BATCH_SIZE = 8
class_list = ["class_1", "class_2"]
FC_LAYERS = [1024, 512, 256]
dropout = 0.5
NUM_EPOCHS = 100
BATCH_SIZE = 8

def build_finetune_model(base_model, dropout, fc_layers, num_classes):
    for layer in base_model.layers:
        layer.trainable = False

    x = base_model.output
    x = Flatten()(x)
    for fc in fc_layers:
        print(fc)
        x = Dense(fc, activation='relu')(x)
        x = Dropout(dropout)(x)
    preditions = Dense(num_classes, activation='softmax')(x)
    finetune_model = Model(inputs = base_model.input, outputs = preditions)
    return finetune_model

base_model = ResNet50(weights = 'imagenet',
                       include_top = False,
                       input_shape = (HEIGHT, WIDTH, 3))

train_datagen = ImageDataGenerator(preprocessing_function = preprocess_input,
                                   rotation_range = 90,
                                   horizontal_flip = True,
                                   vertical_flip = False)

test_datagen = ImageDataGenerator(preprocessing_function = preprocess_input,
                                  rotation_range = 90,
                                  horizontal_flip = True,
                                  vertical_flip = False)

train_generator = train_datagen.flow_from_directory(TRAIN_DIR,
                                                    target_size = (HEIGHT, WIDTH),
                                                    batch_size = BATCH_SIZE)

test_generator = test_datagen.flow_from_directory(TEST_DIR,
                                                  target_size = (HEIGHT, WIDTH),
                                                  batch_size = BATCH_SIZE)

finetune_model = build_finetune_model(base_model,
                                      dropout = dropout,
                                      fc_layers = FC_LAYERS,
                                      num_classes = len(class_list))

adam = Adam(lr = 0.00001)
finetune_model.compile(adam, loss="categorical_crossentropy", metrics=["accuracy"])

filepath = "./checkpoints" + "RestNet50" + "_model_weights.h5"
checkpoint = keras.callbacks.ModelCheckpoint(filepath, monitor = ["acc"], verbose= 1, mode = "max")
cb=TensorBoard(log_dir=("/home/ubuntu/"))
callbacks_list = [checkpoint, cb]

print(train_generator.class_indices)

history = finetune_model.fit_generator(generator = train_generator, epochs = NUM_EPOCHS, steps_per_epoch = 100, 
                                       shuffle = True, callbacks=callbacks_list, validation_data = test_generator)

Update :

  1. Weight file generated from the model after training is 2.7 GB. Is it normal considering the complexity of the model?

  2. How would I select the steps_per_epoch value? Is there any standard?

3 Answers

First of all:

  • I think you should reduce the number of FC layers and number of nodes of FC layers, for example, one FC with 256 or 512, or 2 FC with 256 and 512. Try this.

  • Try to make your batch size 30, and decrease number of epochs to nearly 10 or 20. 100 epochs are too many for your small size dataset.

Secondly, there is more than one way to reduce overfitting:

1- Enlarge your data set by using augmentation techniques such as flip, scale, etc.

2- Using regularization techniques like dropout (you already did it), but you can play with dropout rate. Try more or less than 0.5.

3- One of the good techniques in your case is to do early stopping. In any epoch when you see that the model goes to overfit, stop it.

4- Using cross-validation to train/test your model.

and many more.

Feel free to ask any further questions.

Correct answer by Hunar on May 11, 2021

Be careful with Keras Batch Normalization. You can try this code:

K.set_learning_phase(0)
input_tensor = Input(shape(img_size, img_size, 3))  
base_model = ResNet50(input_tensor=input_tensor, include_top=False, weights="imagenet", pooling="avg")
x = base_model.output
#Define your own top layers
K.set_learning_phase(1)
x = Dense()
...
x = Dense()
model = Model(input_tensor, x)
for layer  in base_model.layers:
    layer.trainable = False

Or you can try to unfreeze the last few convolution layers, that might help. But still, be careful with Batch Normalization. There are many discussions available about this problem with keras's transfer learning.

Answered by bobo yang on May 11, 2021

I implemented various architectures for transfer learning and observed that models containing BatchNorm layers (e.g. Inception, ResNet, MobileNet) perform a lot worse (~30 % compared to >95 % test accuracy) during evaluation (validation/test) than models without BatchNorm layers (e.g. VGG) on my custom dataset. Furthermore, this problem does not occurr when saving bottleneck features and using them for classification. There are already a few blog entries, forum threads, issues and pull requests on this topic and it turns out that the BatchNorm layer uses not the new dataset's statistics but the original dataset's (ImageNet) statistics when frozen:

Assume you are building a Computer Vision model but you don’t have enough data, so you decide to use one of the pre-trained CNNs of Keras and fine-tune it. Unfortunately, by doing so you get no guarantees that the mean and variance of your new dataset inside the BN layers will be similar to the ones of the original dataset. Remember that at the moment, during training your network will always use the mini-batch statistics either the BN layer is frozen or not; also during inference you will use the previously learned statistics of the frozen BN layers. As a result, if you fine-tune the top layers, their weights will be adjusted to the mean/variance of the new dataset. Nevertheless, during inference they will receive data which are scaled differently because the mean/variance of the original dataset will be used.

cited from http://blog.datumbox.com/the-batch-normalization-layer-of-keras-is-broken/

What fixed the problem for me, was to freeze all layers and then unfreeze all BatchNormalization layers to make them use the new dataset's statistics instead of the original statistics:

# build model
input_tensor = Input(shape=train_generator.image_shape)
base_model = inception_v3.InceptionV3(input_tensor=input_tensor,
                                      include_top=False,
                                      weights='imagenet',
                                      pooling='avg')
x = base_model.output

# freeze all layers in the base model
base_model.trainable = False

# un-freeze the BatchNorm layers
for layer in base_model.layers:
    if "BatchNormalization" in layer.__class__.__name__:
        layer.trainable = True

# add custom layers
x = Dense(1024, activation='relu')(x)
x = Dropout(0.5)(x)
x = Dense(train_generator.num_classes, activation='softmax')(x)

# define new model
model = Model(inputs=input_tensor, outputs=x)

This also explains the difference in performance between training the model with frozen layers and evaluate it with a validation/test set and saving bottleneck features (with model.predict the internal backend flag set_learning_phase is set to 0) and training a classifier on the cached bottleneck features.

More information here:

Pull request to change this behavior (not-accepted): https://github.com/keras-team/keras/pull/9965

Similar thread: https://stackoverflow.com/questions/50364706/massive-overfit-during-resnet50-transfer-learning

Answered by mattseibl on May 11, 2021

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP