TransWikia.com

Number of parameters in an LSTM model

Data Science Asked by wabbit on May 4, 2021

How many parameters does a single stacked LSTM have? The number of parameters imposes a lower bound on the number of training examples required and also influences the training time. Hence knowing the number of parameters is useful for training models using LSTMs.

5 Answers

The LSTM has a set of 2 matrices: U and W for each of the (3) gates. The (.) in the diagram indicates multiplication of these matrices with the input $x$ and output $h$.

  • U has dimensions $n times m$
  • W has dimensions $n times n$
  • there is a different set of these matrices for each of the three gates(like $U_{forget}$ for the forget gate etc.)
  • there is another set of these matrices for updating the cell state S
  • on top of the mentioned matrices, you need to count the biases (not in the picture)

Hence total # parameters = $4(nm+n^{2} + n)$

LSTM abstract block

Answered by wabbit on May 4, 2021

According to this:

LSTM cell structure

LSTM cell structure

LSTM equations

LSTM equations

Ingoring non-linearities

Ingoring non-linearities

If the input x_t is of size n×1, and there are d memory cells, then the size of each of W∗ and U∗ is d×n, and d×d resp. The size of W will then be 4d×(n+d). Note that each one of the dd memory cells has its own weights W∗ and U∗, and that the only time memory cell values are shared with other LSTM units is during the product with U∗.

Thanks to Arun Mallya for great presentation.

Answered by ichernob on May 4, 2021

Following previous answers, The number of parameters of LSTM, taking input vectors of size $m$ and giving output vectors of size $n$ is:

$$4(nm+n^2)$$

However in case your LSTM includes bias vectors, (this is the default in keras for example), the number becomes:

$$4(nm+n^2 + n)$$

Answered by Adam Oudad on May 4, 2021

To make it clearer , I annotate the diagram from http://colah.github.io/posts/2015-08-Understanding-LSTMs/.

ot-1 : previous output , dimension , n (to be exact, last dimension's units is n )

i: input , dimension , m

fg: forget gate

ig: input gate

update: update gate

og: output gate

Since at each gate, the dimension is n, so for ot-1 and i to get to each gate by matrix multiplication(dot product), need nn+mn parameters, plus n bias .so total is 4(nn+mn+n).

Answered by Ben2018 on May 4, 2021

to completely receive you'r answer and to have a good insight visit : https://towardsdatascience.com/counting-no-of-parameters-in-deep-learning-models-by-hand-8f1716241889

g, no. of FFNNs in a unit (RNN has 1, GRU has 3, LSTM has 4)

h, size of hidden units

i, dimension/size of input

Since every FFNN(feed forward neural network) has h(h+i) + h parameters, we have

num_params = g × [h(h+i) + h]

Example 2.1: LSTM with 2 hidden units and input dimension 3.

enter image description here

g = 4 (LSTM has 4 FFNNs)

h = 2

i = 3

num_params

= g × [h(h+i) + h]

= 4 × [2(2+3) + 2]

= 48

    input = Input((None, 3))
    lstm = LSTM(2)(input)
    model = Model(input, lstm)

thanks to RAIMI KARIM

Answered by Ali Alipoury on May 4, 2021

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP