Hyperparameter search for LSTM-RNN using Keras (Python)

Question

From Keras RNN Tutorial: "RNNs are tricky. Choice of batch size is important, choice of loss and optimizer is critical, etc. Some configurations won't converge."

So this is more a general question about tuning the hyperparameters of a LSTM-RNN on Keras. I would like to know about an approach to finding the best parameters for your RNN.

I began with the IMDB example on Keras' Github.

the main model looks like this:

(X_train, y_train), (X_test, y_test) = imdb.load_data(nb_words=max_features,
                                                      test_split=0.2)

max_features = 20000
maxlen = 100  # cut texts after this number of words (among top max_features most common words)
batch_size = 32

model = Sequential()
model.add(Embedding(max_features, 128, input_length=maxlen))
model.add(LSTM(128))  
model.add(Dropout(0.5))
model.add(Dense(1))
model.add(Activation('sigmoid'))

# try using different optimizers and different optimizer configs
model.compile(loss='binary_crossentropy',
          optimizer='adam',
          class_mode="binary")

print("Train...")
model.fit(X_train, y_train, batch_size=batch_size, nb_epoch=3,
      validation_data=(X_test, y_test), show_accuracy=True)
score, acc = model.evaluate(X_test, y_test,
                        batch_size=batch_size,
                        show_accuracy=True)

print('Test accuracy:', acc)
Test accuracy:81.54321846

81.5 is a fair score and more importantly it means that the model, even though not fully optimized, it works.

My data is Time Series and the task is binary prediction, the same as the example. And now my problem looks like this:

#Training Data
train = genfromtxt(os.getcwd() + "/Data/trainMatrix.csv", delimiter=',', skip_header=1)
validation = genfromtxt(os.getcwd() + "/Data/validationMatrix.csv", delimiter=',', skip_header=1)

#Targets
miniTrainTargets = [int(x) for x in genfromtxt(os.getcwd() + "/Data/trainTarget.csv", delimiter=',', skip_header=1)]
validationTargets = [int(x) for x in genfromtxt(os.getcwd() + "/Data/validationTarget.csv", delimiter=',', skip_header=1)]

#LSTM
model = Sequential()
model.add(Embedding(train.shape[0], 64, input_length=train.shape[1]))
model.add(LSTM(64)) 
model.add(Dropout(0.5))
model.add(Dense(1))
model.add(Activation('sigmoid'))

# try using different optimizers and different optimizer configs
model.compile(loss='binary_crossentropy',
          optimizer='adam',
          class_mode="binary")

model.fit(train, miniTrainTargets, batch_size=batch_size, nb_epoch=5,
      validation_data=(validation, validationTargets), show_accuracy=True)
valid_preds = model.predict_proba(validation, verbose=0)
roc = metrics.roc_auc_score(validationTargets, valid_preds)
print("ROC:", roc)
ROC:0.5006526

The model is basically the same as the IMDB one. Though the result means it's not learning anything. However, when I use a vanilla MLP-NN I don't have the same problem, the model learns and the score increases. I tried increasing the number of epochs and increasing-decreasing the number of LTSM units but the score won't increase.

So I would like to know a standard approach to tuning the network because in theory the algorithm should perform better than a multilayer perceptron network specially for this time series data.

pir · Answer

An embedding layer turns positive integers (indexes) into dense vectors of fixed size. For instance, [[4], [20]] -> [[0.25, 0.1], [0.6, -0.2]]. This representation conversion is learned automatically with the embedding layer in Keras (see the documentation).

However, it seems that your data does not need any such embedding layer to perform a conversion. Having an unnecessary embedding layer is likely why you cannot get your LSTM to work properly. If that is the case then you should simply remove the embedding layer.

The first layer in your network should then have the input_shape argument added with information on the dimensions of your data (see examples). Note that you can add this argument to any layer - it will not be present in the documentation for any specific layer.

By the way, hyperparameters are often tuned using random search or Bayesian optimization. I would use RMSProp and focus on tuning batch size (sizes like 32, 64, 128, 256 and 512), gradient clipping (on the interval 0.1-10) and dropout (on the interval of 0.1-0.6). The specifics of course depend on your data and model architecture.

Mutian Zhai · Answer

I would recommend Bayesian Optimization for hyper parameter search and had good results with Spearmint. You might have to use an older version for commercial use.

SHASHANK GUPTA · Answer

I would suggest using hyperopt , which uses a kind of Bayesian Optimization for search optimal values of hyperparameters given the objective function. It is more intuitive to use than Spearmint.
PS : There is a wrapper of hyperopt speifically for keras, hyperas. You can also use it.

mikkokotila · Answer

Talos is exactly what you're looking for; an automated solution for searching hyperparameter combinations for Keras models. I might not be objective as I'm the author, but the intention have been to provide an alternative with the lowest possible learning curve while exposing Keras functionality entirely.

Alternatively, as it had already been mentioned, you can look into Hyperas, or then SKlearn or AutoKeras. To my knowledge, at the time of writing, these 4 are the options for Keras users specifically.

Hyperparameter search for LSTM-RNN using Keras (Python)

4 Answers

Add your own answers!

Ask a Question