Why does my model sometimes not learn well from same data?

Data Science Asked by ManInMoon on November 30, 2020

I have a dataset of 2 classes, both containing 2K images. I have split that into 1500 images for training and 500 images for validation.

This is a simple structure for testing purposes, and each image is classified depending on the colour of a particular pixel. Either Green or Red.

I have run this model many times and I find that sometimes the models gets low loss/ high accuracy within a few epochs, but other times it gets stuck at accuracy 50%.

The datasets are exactly the same each time with only difference coming from model.fit “shuffle” option.

I tested the LR Range first:

and I “cycle” the learning rate through an appropriate range.

model = keras.Sequential([
keras.layers.Dense(112, activation=tf.nn.relu, input_shape=(224, 224, 3)),
keras.layers.Dropout(0.4),
keras.layers.Flatten(),
keras.layers.Dense(3, activation=tf.nn.softmax)
])

LRS = CyclicLR(base_lr=0.000005, max_lr=0.0003, step_size=200.)

model.fit(train_images, train_labels, shuffle=True, epochs=10,
callbacks=[checkpoint,
LRS],
validation_data = (test_images, test_labels)
)


Why does the model sometimes NOT get a good fit?

EDIT 1

Re Serali’s suggestion:

myOpt = keras.optimizers.Adam(lr=0.001,decay=0.01)
model.compile(optimizer=myOpt, loss='categorical_crossentropy',  metrics=['accuracy'])

reduce_lr = keras.callbacks.ReduceLROnPlateau(monitor='val_loss', factor=0.2,verbose=1,patience=5, min_lr=0.00001)


Have you tried without the "CycleLR" part? It is obviously a good idea but as far as I know, not a standard Keras function - needs to be implemented separately. Any small error in this implementation might cause problems.

Answered by serali on November 30, 2020

Shuffle = True should give better results particularly when you are running for more epochs. But I am not clear why there is so huge difference in the accuracy when shuffle changes . One thing you can try is , to increase the number of epochs and see if the accuracy improves. For shuffle = False

Shuffle set to false, allows you to use the previously trained data. Setting this to true means that you either want to retrain or set the epoch to some value greater than 10. To learn, but this increases the chances of memorization (over fitting)

Answered by Bala_Science on November 30, 2020

Related Questions

Decomposing R squared or VIF

1  Asked on December 17, 2020 by jun-jang

Finding relationships between multiple readings from an IoT sensor network time-series data

0  Asked on December 17, 2020 by prabath

HR employee attrition modeling – making a balanced sample question

0  Asked on December 16, 2020 by nimrod-ets

Sample Space of Longest Run

1  Asked on December 16, 2020 by a-may

When to choose character instead of factor in R?

2  Asked on December 16, 2020 by lupi5

Reporting descriptive statistics in case of missing data

0  Asked on December 16, 2020 by jishan

Hyperparameter search for LSTM-RNN using Keras (Python)

4  Asked on December 16, 2020 by wacax

Product classification according to description

1  Asked on December 16, 2020 by huy

How to choose appropriate epsilon value while approximating gradients to check training?

0  Asked on December 16, 2020 by david-tth

How to build a classification pipeline that will pass to another model?

0  Asked on December 16, 2020 by samay-lakhani

Data storage for Intrusion Detection System

0  Asked on December 16, 2020 by user2071938

What is the exact definition of VC dimension?

2  Asked on December 16, 2020 by kaushal28

How to use fresh data when target prediction period is long?

2  Asked on December 16, 2020 by haffi112

Confusion in applying k-fold cross validation to dataset

1  Asked on December 16, 2020 by mr-nlp

How would you describe cluster 2 from this output of a run of the EM program?

1  Asked on December 16, 2020 by shroomy

Demand Forecasting – Error in predictions

0  Asked on December 16, 2020 by gopik-anand

How does attention mechanism learn?

3  Asked on December 16, 2020 by user2790103

Does standardization result in normal distribution?

1  Asked on December 15, 2020 by scarlett-rouge

Representing geospatial information

1  Asked on December 15, 2020 by exi_le