# Expected performance of training tf.keras.Sequential model with model.fit, model.fit_generator and model.train_on_batch

Data Science Asked by Tuukka Nieminen on October 10, 2020

I am using Keras with Tensorflow backend to train a simple 1D CNN to detect specific events from sensor data. While the data with tens of millions samples easily fits to the ram in the form of an 1D float array, it obviously takes a huge amount of memory to store the data as a N x inputDim array that can be passed to model.fit for training. While I can use model.fit_generator or model.train_on_batch to generate the required mini batches on the fly, for some reason I am observing a huge performance gap between model.fit and model.fit_generator & model.train_on_batch even though everything is stored in memory and mini batch generation is fast as it basically only consists of reshaping the data. Therefore, I’m wondering whether I am doing something terribly wrong or if this kind of performance gap is to be expected. I am using the cpu version of Tensorflow 2.0 with 3.2 GHz Intel Core i7 processor (4 cores with multithreading support) and Python 3.6.3. on Mac Os X Mojave.

In short, I created a dummy python script to recreate the issue, and it reveals that with batch size of 64, it takes 407 seconds to run 10 epochs with model.fit, 1852 seconds with model.fit_generator, and 1985 seconds with model.train_on_batch. CPU loads are ~220%, ~130%, and ~120% respectively, and it seems especially odd that model.fit_generator & model.train_on_batch are practically on par, while model.fit_generator should be able to parallelise mini batch creation and model.train_on_batch definitely does not. That is, model.fit (with huge memory requirements) beats the other solution candidates with easily manageable memory requirements by a factor of four. Obviously, CPU loads increase and total training times decrease by increasing batch size, but model.fit is always fastest with a a margin of at least two up to batch size of 8096. In that case, model.fit takes 99 seconds to run 10 epochs with cpu load of ~860% (or pretty much everything I have got), model.fit_generator takes 179 seconds with cpu load of ~700%, and model.train_on_batch takes 198 seconds with CPU load of ~680%.

Is this kind of behaviour normal (when there is no GPU involved) or what could/should be done in order to increase the computational performance of the less memory intensive options with sensible batch sizes? Specifically model.fit_generator fails to provide decent performance. It seems that no such option is available to divide all data into manageable pieces, and then run model.fit in iterative manner with constantly changing training data.

Please do note that the provided dummy script is just what the name suggests, and the amount of data has been trimmed so that it makes all three options feasible. The used model, however, is similar to what I am actually using (to provide a realistic situation).

from tqdm       import tqdm

import numpy as np
import tensorflow as tf

import time
import sys
import argparse

inputData    = None
outputData   = None
batchIndices = None
opts         = None

class DataGenerator(tf.keras.utils.Sequence):

global inputData
global outputData
global batchIndices

'Generates data for Keras'
def __init__(self, batchSize, shuffle):
'Initialization'
self.batchIndices = batchIndices
self.batchSize    = batchSize
self.shuffle      = shuffle
self.on_epoch_end()

def __len__(self):
'Denotes the number of batches per epoch'
return int( np.floor( inputData.size / self.batchSize ) )

def __getitem__(self, index):
'Generate one batch of data'

# Generate data
X, y = self.__data_generation(self.indexes[index*self.batchSize:(index+1)*self.batchSize])

return X, y

def on_epoch_end(self):
self.indexes = np.arange(inputData.size)
if self.shuffle == True:
np.random.shuffle(self.indexes)

def __data_generation(self, INDX):
'Generates data containing batch_size samples'

# Generate data
X = np.expand_dims( inputData[ np.mod( batchIndices + np.reshape(INDX,(INDX.size,1)) , inputData.size ) ], axis=2)
y = outputData[INDX,:]

return X, y

def main( ):

global inputData
global outputData
global batchIndices
global opts

# Data generation

print(' ')
print('Generating data...')

np.random.seed(0) # For reproducible results

inputDim  = int(104)                      # Input  dimension
outputDim = int(  2)                      # Output dimension
N         = int(1049344)                  # Total number of samples
M         = int(5e4)                      # Number of anomalies
trainINDX = np.arange(N, dtype=np.uint32)

inputData = np.sin(trainINDX) + np.random.normal(loc=0.0, scale=0.20, size=N) # Source data stored in a single array

anomalyLocations = np.random.choice(N, M, replace=False)

inputData[anomalyLocations] += 0.5

outputData = np.zeros((N,outputDim)) # One-hot encoded target array without ones

for i in range(N):
if( np.any( np.logical_and( anomalyLocations >= i, anomalyLocations < np.mod(i+inputDim,N) ) ) ):
outputData[i,1] = 1 # set class #2 to one if there is at least a single anomaly within range [i,i+inputDim)
else:
outputData[i,0] = 1 # set class #1 to one if there are no anomalies within range [i,i+inputDim)

print('...completed')
print(' ')

# Create a model for anomaly detection

model = tf.keras.Sequential([
tf.keras.layers.Conv1D(filters=24, kernel_size=9, strides=1, padding='valid', dilation_rate=1, activation='relu', use_bias=True, kernel_initializer='glorot_uniform', bias_initializer='zeros', input_shape=(inputDim,1)),
tf.keras.layers.Flatten(),
tf.keras.layers.Dense(20, activation='relu', use_bias=True),
tf.keras.layers.Dense(outputDim, activation='softmax')
])

loss=tf.keras.losses.CategoricalCrossentropy(),
metrics=[tf.keras.metrics.CategoricalAccuracy()])

print(' ')

relativeIndices = np.arange(inputDim)                            # Indices belonging to a single sample relative to current position
batchIndices    = np.tile( relativeIndices, (opts.batchSize,1) ) # Relative indices tiled into an array of size ( batchSize , inputDim )
stepsPerEpoch   = int( np.floor( N / opts.batchSize ) )          # Steps per epoch

# Create an intance of dataGenerator class
generator = DataGenerator(batchSize=opts.batchSize, shuffle=True)

# Solve by gathering data into a large float32 array of size ( N , inputDim ) and feeding it to model.fit

startTime = time.time()

X = np.expand_dims( inputData[ np.mod( np.tile(relativeIndices,(N,1)) + np.reshape(trainINDX,(N,1)) , N ) ], axis=2)
y = outputData[trainINDX, :]

history = model.fit(x=X, y=y, sample_weight=None, batch_size=opts.batchSize, verbose=1, callbacks=None, validation_split=None, shuffle=True, epochs=opts.epochCount)

referenceTime = time.time() - startTime
print(' ')
print('Total solution time with model.fit: %6.3f seconds' % referenceTime)
print(' ')

# Solve with model.fit_generator

startTime = time.time()

history = model.fit(x=generator, steps_per_epoch=stepsPerEpoch, verbose=1, callbacks=None, epochs=opts.epochCount, max_queue_size=1024, use_multiprocessing=False)

generatorTime = time.time() - startTime
print(' ')
print('Total solution time with model.fit_generator: %6.3f seconds (%6.2f %% more)' % (generatorTime, 100.0 * generatorTime/referenceTime))
print(' ')

# Solve by gathering data into batches of size ( batchSize , inputDim ) and feeding it to model.train_on_batch

startTime = time.time()

for epoch in range(opts.epochCount):

print(' ')
print('Training epoch # %2d ...' % (epoch+1))
print(' ')

np.random.shuffle(trainINDX)

epochStartTime = time.time()

for step in tqdm( range( stepsPerEpoch ) ):

INDX = trainINDX[ step*opts.batchSize : (step+1)*opts.batchSize ]

X = np.expand_dims( inputData[ np.mod( batchIndices + np.reshape(INDX,(opts.batchSize,1)) , N ) ], axis=2)
y = outputData[INDX,:]

history = model.train_on_batch(x=X, y=y, sample_weight=None, class_weight=None, reset_metrics=False)

print(' ')
print('...completed with loss = %9.6e, accuracy = %6.2f %%, %6.2f ms/step' % (history[0], 100.0*history[1], (1000*(time.time() - epochStartTime)/np.floor(trainINDX.size / opts.batchSize))))
print(' ')

batchTime = time.time() - startTime
print(' ')
print('Total solution time with model.train_on_batch: %6.3f seconds (%6.2f %% more)' % (batchTime, 100.0 * batchTime/referenceTime))
print(' ')

parser = argparse.ArgumentParser()

default=128,
help='Batch size')
default=5,
help='Epoch count')

opts, unparsed = parser.parse_known_args()

if __name__== "__main__":
main( )
$$$$
`

To answer the question myself, I recently updated to Python 3.7.7 and TensorFlow 2.2.0 rc2 and suddenly all my issues vanished. Now, running for 5 epochs with the default batch size of 128, model.fit with explicitly formed numpy arrays takes 126.162 seconds, model.fit with the provided generator takes 149.053 seconds, and model.train_on_batch takes 240.698 seconds. This with the default version of TensorFlow w/o support for AVX2 & FMA instructions supported by my CPU.

Answered by Tuukka Nieminen on October 10, 2020

## Related Questions

### Hot to use the formula model in t.test

1  Asked on May 14, 2021 by molitoris

### How do I plot data in Octave?

1  Asked on May 14, 2021 by mrpythonic

### Not able to get a good accuracy score for the classification problem

0  Asked on May 14, 2021 by pilli-vineeth

### Support vector machines in R: Finding the equation of a hyper plane (in 6 dimensions) and showing it’s correct

1  Asked on May 14, 2021 by diesel-blue

### How to detect time for the future events in time series data?

1  Asked on May 14, 2021

### Accessing and Multiplying Individual Elements of a Layer’s Output in Keras

1  Asked on May 13, 2021

### is padding input images better than resizing image?

0  Asked on May 13, 2021

### Question about graphing the clusters in K means

3  Asked on May 13, 2021

### Model Selection using Bias Variance Trade Off

1  Asked on May 12, 2021

### Using Information from the rest of a Sequence to Predict the Label for any one Item

1  Asked on May 12, 2021 by dave-babbitt

### How can I explain the cause of different performances for two different LSTM models and improve the performance?

1  Asked on May 12, 2021

### Single image feature reduction at inference time : SVM

1  Asked on May 12, 2021 by sanket-kumar-mali

### What is the opposite of baseline?

2  Asked on May 12, 2021 by joe_mind

### Is it better to have one model with more categories or less with two for multi-label classification?

1  Asked on May 12, 2021 by nouf

### Extended ResNet

0  Asked on May 12, 2021 by coderhk

### How to add a new category to a existing trained deep learning model?

2  Asked on May 11, 2021 by subham-tiwari

### Using Wasserstein loss function for image-to-image-regression

0  Asked on May 11, 2021 by uberfatty

### Over fitting in Transfer Learning with small dataset

3  Asked on May 11, 2021 by deepguy

### Modules on Python which are useful for missing Word/Letter prediction in text paragraphs from a coprpus

1  Asked on May 11, 2021 by sneha-challa

### Choosing the size of the network for Neural Collaborative Filtering (NCF)?

0  Asked on May 11, 2021 by bkaankuguoglu