Why are results without Transfer Learning better than with Transfer Learning?

Data Science Asked by Tobitor on December 23, 2020

I developed a neural network for license plate recognition and used the EfficientNet architecture ( with and without pretrained weights on ImageNet and with and without data augmentation. I only had 10.000 training images and 3.000 validation images. That was the reason I applied Transfer learning and image augmentation (AdditiveGaussianNoise).

I created this model:

efnB0_model = efn.EfficientNetB0(include_top=False, weights="imagenet", input_shape=(224, 224, 3))
efnB0_model.trainable = False

def create_model(input_shape = (224, 224, 3)):
    input_img = Input(shape=input_shape)
    model = efnB0_model (input_img)
    model = GlobalAveragePooling2D(name='avg_pool')(model)
    model = Dropout(0.2)(model)
    backbone = model

    branches = []
    for i in range(7):
            branches[i] = Dense(360, name="branch_"+str(i)+"_Dense_360")(branches[i])
            branches[i] = BatchNormalization()(branches[i])
            branches[i] = Activation("relu") (branches[i])
            branches[i] = Dropout(0.2)(branches[i])                        
            branches[i] = Dense(35, activation = "softmax", name="branch_"+str(i)+"_output")(branches[i])
    output = Concatenate(axis=1)(branches)
    output = Reshape((7, 35))(output)
    model = Model(input_img, output)

    return model

I compiled the model:

opt = keras.optimizers.Adam(learning_rate=0.0001)
model.compile(loss='categorical_crossentropy', optimizer=opt, metrics=["accuracy"])

And used this code to fit it:

hist =
    x=training_generator, epochs=10, verbose=1, callbacks=None,
    validation_data=validation_generator, steps_per_epoch=num_train_samples // 16,
    validation_steps=num_val_samples // 16,
    max_queue_size=10, workers=6, use_multiprocessing=True)

My hypotheses were:

H1: The EfficientNet architecture is applicable to license plate recognition.

H2: Transfer learning will improve accuracy in license plate recognition (compared to the situation without Transfer Learning).

H3: Image augmentation will improve accuracy in license plate recognition (compared to the situation without it).

H4: Transfer Learning combined with Image augmentation will bring the best results.

I now got this results:

enter image description here

So, H1 seems to be correct. But H2, H3 and H4 seem to wrong.

I was thinking about it and got an explanation for H3 and H4, which seem to be logical for me. That is, that image augmentation is too heavy and deteriorates the quality of images by a degree which makes it very hard for the network to recognize the characters.

1. Is this a suitable explanation and are there other ones additionally?

It seems to be the case, that image augmentation was too strong. So, first question is solved.

Regarding H2 I am little confued to be honest. The network seems to overfit but stagnates completely regarding validation accuracy. So, the conclusion that the Imagenet weights are not applicable seems not logical to me because the network learnt something for the training data. I also excluded the possibility that the data volume is to small since we had that good recognition rates without using Transfer learning or image augmentation…

2. Is there any logical explanation for this?

2 Answers

As @fuwiak mentioned, transfer learning may not work if pre-trained model has been fitted on a "very different" dataset. Typically if the pre-trained network extract information that is not relevant for your problem.

Moreover, in the paper License Plate Recognition System Based on Transfer Learning (that you shared with me), they have tried to freeze some layers of a pretrained Xception (based on ImageNet weights) to see the impact on the training. They conclude that ImageNet data and license plate data are too different to freeze layers. So your results are confirmed.

Now changing efnB0_model.trainable = False to True would allow the pre-trained network to update and to be more relevant to your problem. Generally, if you don't have time issues, it seems to be always better (see this post). Will it give better results than initialize the weights randomly ? I think one can presume but cannot know.

Correct answer by etiennedm on December 23, 2020

At least two issues:

Negative transfer

  • Transfer learning working if the initial and our problem are similar. Unfortunately, we think that there are similar enough, but its just illusion.

Data greedy

  • Often model start working well, if we provide much more data.

Answered by fuwiak on December 23, 2020

Add your own answers!

Related Questions

what is the training phase in N-gram model?

1  Asked on July 4, 2021 by black-sheep-369


Sales person performance prediction

1  Asked on July 3, 2021 by optimizor


Number of Parameters in Multidimensional rnn

0  Asked on July 3, 2021 by strickban


How do I space out the grouped bar plots in plotly?

0  Asked on July 3, 2021 by vraka0723


Ask a Question

Get help from others!

© 2023 All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP