TransWikia.com

LSTM predicting constant value throughout

Stack Overflow Asked by HarryS on December 22, 2021

I understand that it is a long post, but help in any of the sections is appreciated.
I have some queries about the prediction method of my LSTM model. Here is a general summary of my approach:

  • I used a dataset having 50 time series for training. They start with a value of 1.09 all the way up to 0.82, with each time series having between 570 to 2000 datapoints (i.e, each time series has a different length, but similar trend).
  • I converted them to the dataset accepted by keras’ LSTM/Bi-LSTM layers in the format:
    [1, 0.99, 0.98, 0.97] ==Output==> [0.96]
    [0.99, 0.98, 0.97, 0.96] ==Output==> [0.95]
    and so on..
  • Shapes of the input and output containers (arrays): input(39832, 5, 1) and output(39832, )
  • Error-free training
  • Prediction on an initial points of data (window) having shape (1, 5, 1). This has been taken from the actual data.
  • The predicted output is one value, which is appended to a separate list (for plotting), as well as appended to the window, and the first value of the window dropped out. This window is then fed as input to the model to generate the next prediction point.
  • Continue this until I get the whole curve for both models (LSTM and Bi-LSTM)

However, the prediction is not even close to the actual data. It flatlines to a fixed value, whereas it should be somewhat like the black curve (which is the actual data)

Figure:
https://i.stack.imgur.com/Ofw7m.png

Model (similar code goes for Bi-LSTM model):

model_lstm = Sequential()
model_lstm.add(LSTM(128, input_shape=(timesteps, 1), return_sequences= True))
model_lstm.add(Dropout(0.2))
model_lstm.add(LSTM(128, return_sequences= False))
model_lstm.add(Dropout(0.2))
model_lstm.add(Dense(1))
model_lstm.compile(loss = 'mean_squared_error', optimizer = optimizers.Adam(0.001))

Curve prediction initialize:

start = cell_to_test[0:timesteps].reshape(1, timesteps, 1)
y_curve_lstm = list(start.flatten())
y_window = start

Curve prediction:

while len(y_curve_lstm) <= len(cell_to_test):
  yhat = model_lstm.predict(y_window)
  yhat = float(yhat)
  y_curve_lstm.append(yhat)
  y_window = list(y_window.flatten())
  y_window.append(yhat)
  y_window.remove(y_window[0])
  y_window = np.array(y_window).reshape(1, timesteps, 1)
  #print(yhat)

Model summary:

Model: "sequential_5"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
lstm_5 (LSTM)                (None, 5, 128)            66560     
_________________________________________________________________
dropout_5 (Dropout)          (None, 5, 128)            0         
_________________________________________________________________
lstm_6 (LSTM)                (None, 128)               131584    
_________________________________________________________________
dropout_6 (Dropout)          (None, 128)               0         
_________________________________________________________________
dense_5 (Dense)              (None, 1)                 129       
=================================================================
Total params: 198,273
Trainable params: 198,273
Non-trainable params: 0
_________________________________________________________________

And in addition to diagnosing the problem, I am really trying to find the answers to the following questions (I looked up other sources, but in vain):

  1. Is my data enough to train the LSTM model? I have been told that it requires thousands of data points, so I feel that my current dataset more than suffices the condition.
  2. Is my model less/more complex than it needs to be?
  3. Does increasing the number of epochs, layers, and the neurons per layer always lead to a ‘better’ model, or are there optimal values for the same? If the latter, then is there a method to find this optimal point, or is hit-and-trail the only way?
  4. I trained with the number of epochs=25, which gave me a loss of 1.25 * 10e-4. Should the loss be lower for the model to predict the trend? (I am focused on getting the shape first, accuracy later, because the training takes too long with higher epochs)
  5. In continuation to the previous question, does loss have the same unit as the data? The reason why I am asking this is because the data has a resolution of up to 10e-7.

Once again, I understand that it has been a long post, but help in any of the sections is appreciated.

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP