TransWikia.com

How can I explain the cause of different performances for two different LSTM models and improve the performance?

Data Science Asked on May 12, 2021

I’ve built two different models for Load Forecasting. Dataset has six features. The performance evaluation metric is the Mean Absolute Percentage Error(MAPE). Both models are based on LSTM. Here is the first model and performance.

model = Sequential()
model.add(LSTM(20, input_shape=(1,6),init= 'uniform', return_sequences=True)) #input shape (1,5) to (1,6)
model.add(LSTM(20,init= 'uniform', activation='relu'))
model.add(Dense(256,init= 'uniform', activation='relu'))
model.add(Dense(256,init= 'uniform', activation='relu'))
model.add(Dense(1,init= 'uniform', activation='relu'))

For 30 epochs:

  • TRAIN loss = 3.7549, TEST LOSS : 3.5419
  • Time Requered: 5.9089 minutes

Second model is stacked-LSTM. Here is the code:

model = Sequential()
model.add(LSTM(20, input_shape=(1,6),init= 'uniform', return_sequences=True)) #input shape (1,5) to (1,6)
model.add(LSTM(20,init= 'uniform', activation='relu',return_sequences=True))
model.add(LSTM(20,init= 'uniform', activation='relu',return_sequences=True))
model.add(LSTM(20,init= 'uniform', activation='relu',return_sequences=True))
model.add(LSTM(20,init= 'uniform', activation='relu'))
model.add(Dense(1,init= 'uniform', activation='relu'))

For 30 epochs:

  • TRAIN loss = 5.0825, TEST LOSS : 5.1821
  • Time Requered: 5.7295 minutes

In both case I have compiled models in this way:

model.compile(loss='mean_absolute_percentage_error', optimizer='adam', metrics=['acc'])

Definitly, second model is working better. But why second model(stacked-lstm) is not working well? And how can I improve better than that? For the first model after 150 epochs gradient vanishes whene loss is around 3.01. I’m aiming for less than 2.

One Answer

LSTM especially works for time-series data. I'm assuming these features are not dependent temporally, so it's not ideal to use LSTM or any RNN variant.

Remove the LSTM layers, reduce the number of neurons, use regularizer, dropout. Also, using ReLU in the last layer doesn't make any sense. If it's a regression problem, just use linear activation instead.

To reduce the error, do a hyperparameter tuning with a randomized search or grid search.

Answered by Zabir Al Nazi on May 12, 2021

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP