How can I explain the cause of different performances for two different LSTM models and improve the performance?

Question

I've built two different models for Load Forecasting. Dataset has six features. The performance evaluation metric is the Mean Absolute Percentage Error(MAPE). Both models are based on LSTM. Here is the first model and performance.

model = Sequential()
model.add(LSTM(20, input_shape=(1,6),init= 'uniform', return_sequences=True)) #input shape (1,5) to (1,6)
model.add(LSTM(20,init= 'uniform', activation='relu'))
model.add(Dense(256,init= 'uniform', activation='relu'))
model.add(Dense(256,init= 'uniform', activation='relu'))
model.add(Dense(1,init= 'uniform', activation='relu'))

For 30 epochs:

TRAIN loss = 3.7549, TEST LOSS : 3.5419
Time Requered: 5.9089 minutes

Second model is stacked-LSTM. Here is the code:

model = Sequential()
model.add(LSTM(20, input_shape=(1,6),init= 'uniform', return_sequences=True)) #input shape (1,5) to (1,6)
model.add(LSTM(20,init= 'uniform', activation='relu',return_sequences=True))
model.add(LSTM(20,init= 'uniform', activation='relu',return_sequences=True))
model.add(LSTM(20,init= 'uniform', activation='relu',return_sequences=True))
model.add(LSTM(20,init= 'uniform', activation='relu'))
model.add(Dense(1,init= 'uniform', activation='relu'))

For 30 epochs:

TRAIN loss =  5.0825, TEST LOSS : 5.1821
Time Requered: 5.7295 minutes

In both case I have compiled models in this way:

model.compile(loss='mean_absolute_percentage_error', optimizer='adam', metrics=['acc'])

Definitly, second model is working better. But why second model(stacked-lstm) is not working well? And how can I improve better than that? For the first model after 150 epochs gradient vanishes whene loss is around 3.01. I'm aiming for less than 2.

Zabir Al Nazi · Answer

LSTM especially works for time-series data. I'm assuming these features are not dependent temporally, so it's not ideal to use LSTM or any RNN variant.

Remove the LSTM layers, reduce the number of neurons, use regularizer, dropout.
Also, using ReLU in the last layer doesn't make any sense. If it's a regression problem, just use linear activation instead.

To reduce the error, do a hyperparameter tuning with a randomized search or grid search.

How can I explain the cause of different performances for two different LSTM models and improve the performance?

One Answer

Add your own answers!

Ask a Question