Artificial Intelligence Asked by K. Do on December 13, 2020
Despite the problem being very simple, I was wondering why an LSTM network was not able to converge to a decent solution.
import numpy as np
import keras
X_train = np.random.rand(1000)
y_train = X_train
X_train = X_train.reshape((len(X_train), 1, 1))
model= keras.models.Sequential()
model.add(keras.layers.wrappers.Bidirectional(keras.layers.LSTM(1, dropout=0., recurrent_dropout=0.)))
model.add(keras.layers.Dense(1))
optimzer = keras.optimizers.SGD(lr=1e-1)
model.build(input_shape=(None, 1, 1))
model.compile(loss=keras.losses.mean_squared_error, optimizer=optimzer, metrics=['mae'])
history = model.fit(X_train, y_train, batch_size=16, epochs=100)
After 10 epochs, the algorithm seems to have reached its optimal solution (around 1e-4
RMSE), and is not able to improve further the results.
A simple Flatten + Dense network with similar parameters is however able to achieve 1e-13 RMSE.
I’m surprised the LSTM cell does not simply let the value through, is there something I’m missing with my parameters? Is LSTM only good for classification problems?
I think there are some problems with your approach.
Firstly, looking at the Keras documentation, LSTM expects an input of shape (batch_size, timesteps, input_dim)
. You're passing an input of shape (1000, 1, 1)
, which means that you're having "sequences" of 1 timestep.
RNNs have been proposed to capture temporal dependencies, but it's impossible to capture such dependencies when the length of each series is 1, and the numbers are randomly generated. If you want to create a more realistic scenario, I would suggest you generate a sine wave, since it has a smooth periodic oscillation. Afterward, increase the timesteps from 1, and you can test on the following timestamps (to make predictions).
For the second part, if you think about a normal RNN (I will explain for a simple RNN but you can imagine a similar flow for LSTM) and a Dense
layer when applied to 1 timestamp, there are not so many many differences. The dense layer will have $Y=f(XW + b)$, where $X$ is the input, $W$ is the weight matrix, $b$ is the bias and $f$ is the activation function. Whereas RNN will have $Y=f(XW_1 + W_2h_0 + b)$, since is the first timestamp $h_0$ is $0$, so we can reduce it to $Y=f(XW_1 +b)$, which is identical with the Dense
layer. I suspect that the results differences are caused by the activation functions, by default Dense
layer has no activation function, and LSTM has tanh and sigmoid.
Answered by razvanc92 on December 13, 2020
1 Asked on September 10, 2020 by tmt
deep learning deep rl dqn machine learning reinforcement learning
0 Asked on September 9, 2020 by foxcharles
deep learning machine learning neural networks q learning reinforcement learning
1 Asked on September 7, 2020 by shashank-gargeshwari
1 Asked on August 30, 2020 by toben-aus
0 Asked on August 22, 2020 by nick
1 Asked on August 16, 2020 by moe-kaung-kin
1 Asked on August 9, 2020 by ray-walker
0 Asked on August 1, 2020 by green_wizard
0 Asked on July 26, 2020 by dae-young-park
1 Asked on July 21, 2020 by datdinhquoc
Get help from others!
Recent Answers
Recent Questions
© 2022 AnswerBun.com. All rights reserved. Sites we Love: PCI Database, MenuIva, UKBizDB, Menu Kuliner, Sharing RPP