What is the difference between "Adding more LSTM layers" or "Adding more units on existence layers"?

Question

What is the difference between adding more LSTM layers and just increasing the units of existing layers? Which one is preferred and in which situation?

Escachator · Answer

When you add layers you are increasing the depth of the neural network. If you add more units to the existing layers, you are increasing the width.

In terms of hyperparameters selection, the best I can recommend is to try both and see which one gives you the best performance. Take into account considerations like over fitting, which may happen specially when you increase the complexity of a model.

Savinay_ · Answer

You can relate this intuition for networks with LSTMs:

If you add more units, so intuitively you are adding more nodes into the hidden layer. This will allow the model to add "wide" varieties of implicit relationships ( maybe more than necessary) among the inputs it is getting, as derived info.
This might help or not help for a model to improve accuracy.

If you add more layers. Then the model will not only hold relationships among inputs , but also among the derived information. Thus increasing the "depth" of relationships and combinations for the model.

To answer which approach is better? It depends on your data set and your modelling approach. Hyper-Parameter tuning is a way to push your accuracy limit and it might help you here. Also try a grid search to find if adding layers help you or adding nodes.

Hope you get me!
Cheers!

Allohvk · Answer

With the layers - you are trying to generate higher and higher level features as you proceed from layer to layer. Based on the problem at hand, anything from 2 and above layers could serve the purpose. Given that LSTMs operate on sequence data, it means that the addition of layers adds levels of abstraction of input observations over time. In effect, chunking observations over time or representing the problem at different time scales. This works!
With the units - you are trying to capture the function in a better way. Too less and you could underfit. Too many and there could be overfitting.
By adding more layers, you can reduce the number of units in each layer and bring about a greater 'depth' instead of 'width'. What this simply means is that you can approximate the function better using a 'right' combination of width and depth. To arrive at 'right' there is no option other than to play around for hours with different combinations.
One option is the 'stretch pant' approach. Take much higher sizes than when you actually need. Then use regularisation techniques to fit better.

What is the difference between "Adding more LSTM layers" or "Adding more units on existence layers"?

3 Answers

Add your own answers!

Ask a Question