Clarify recurrent neural networks

Question

I'm in the beginning to learn and understand recurrent neural networks. As far as I can imagine, its multiple feed-forward neural networks with one neuron at each layer put next to each other, and connected from left to right, where each neuron is connected not just with the neuron below it, but the one at the left from the previous time. Not sure if it's a right way to think about it, but so far it's my first impression.
Some things are unclear though.

As far as I understood the final output of each timestep is supposed to predict the input of the next timestep. Is this true? What if I would just like to show the network two images of for example a horse, and depending on them, predict what distance did it go, and in which direction? Is this possible?

In the illustration above there's $A_0$. From where? I would assume at least two timesteps are needed to make a prediction, so in my understanding an $x_0$ is missing from the left side of the diagram. Am I right?

I've been reading through an article which says "Lets train a 2-layer LSTM with 512 hidden nodes". Does it mean two layer of activations, and 512 timesteps?

YuseqYaseq · Accepted Answer

As far as I can imagine, its multiple feed-forward neural networks with one neuron at each layer put next to each other, and connected from left to right, where each neuron is connected not just with the neuron below it, but the one at the left from the previous time.

Not really. Each cyan box in your image represents the exact same cell. Now this cell can be a lot of things just take a look at LSTM cell (h and c represents your A) but it can also be a network which takes $A_i$ and $X_{i+1}$ as input and returns $A_{i+1}$ and $Y_{i+1}$ as output.

It may be true if the RNN tries to predict e.g. a time series. To train such a net you would provide the series as a training input and the same time series but in the next step as an output (so it would try to predict $X_{i+1}$ based on $forall_{j in [1;i)} X_{j}$. But in general it's not true. The output may be in a completely different format and represent completely different thing exactly like in your example. In your example your $X_i$ is an encoded i-th frame and $Y_i$ is what the network thinks the distance the horse has traveled is up until i-th frame.
$A_0$ is the starting state of the RNN. What it is exactly depends on the exact architecture used but it's common to just set it all to zeros. We need this starting state because as I mentioned the same cell is used at each recurrent step so there has to be something to provide as network state in the beginning. There is no $X_0$ missing. Also there is nothing stopping you from making a prediction based on a sequence of length 1. It's just that it's not useful to use a RNN in such a situation.
The number of timesteps taken depend on the data - not the network. You can use the same network on sequence of length 512, 2 and 2 million. That's why they're commonly used to solve problems with varying length like speech recognition. LSTMs have weights just like a normal neural network does. You can think of these 512 hidden nodes as the size of a hidden layer in the cell. Using two layers of LSTM means using two LSTMs with 512  nodes, and using output of the first as the input of the second one. The output of the second LSTM is the output of 2-layer LSTM.

Clarify recurrent neural networks

One Answer

Add your own answers!

Ask a Question