TransWikia.com

The memorisation capacity of an LSTM (real numbers)

Data Science Asked by Sean Lee on August 12, 2021

My question is the following:

It is known that a LSTM can remember sequences of one-hot encodings which represent integers (i.e. output $x_1, … x_n$ after receiving $x_1, … x_n$ as inputs, $x_k in {0,1}^m$, where $m$ is the number of distinct integers).

Is it theoretically possible for the LSTM to learn to remember sequences of real numbers instead (that can be expressed in a finite number of bits), i.e. if $x_t in mathbb{R}$ instead.

The task I’m concerned with is much simpler – I just want to output the first input $x_1$ after reading the entire sequence $x_1, … x_n$. I have done some small experiments with $x_t in mathbb{R}$, using square loss. There seems to be some level of success, however the results aren’t very interpretable (when I look at the weights). Can anyone shed some light on this, specifically:

  1. Does such a configuration of weights exist? (the questions following this quesetion suggests that it does exist)
  2. If so, what are they and if not, why not?

The LSTM model is specified by:

The input, forget and outputs gates:

$$f_t = sigma(W_f [h_{t-1}, x_t] + b_f)$$
$$i_t = sigma(W_i [h_{t-1}, x_t] + b_i)$$
$$o_t = sigma(W_o [h_{t-1}, x_t] + b_o)$$

And the internal state $c_t$ and hidden state $h_t$:

$$c_t = f_t * c_{t-1} + i_t * text{tanh}(W_c[h_{t-1}, x_t] + b_c) $$
$$h_t = o_t * text{tanh}(c_t)$$

As requested, this is the assignment question:

Memory Task Description

Consider the following task: Given an input sequence of $n$ numbers, we would like a system which, after reading this sequence will return the first number in the sequence. That is given an input sequence: $(x_1, x_2, cdots x_n)$, $x_i in mathbb{R}$ the system has to return, at time $t=n$ after ‘reading’ the last input $x_n$, the first input $x_1$.

  1. Given the task above, consider the above recurrent models (RNNs/LSTMs/GRUs). Which of these arhitectures can (theoretically) perfom the task above? In answering this questions, please consider a simple one-layer model of RNNs/GRU/LSTM with a one-dimensional input $x_t$, a $32$-dim hidden and output layer, followed by a transformation to a one-dimensional final output which should predict $x_0$.
    Whenever the answer is positive, give the gates’ activations and weigths that will produce the desired behaviour. Whenever the answer is no, prove that there exists no such parameters that an arbitrary input sequence can be transformed to produce the first symbol read.

One Answer

Welcome to the site! If you're referring to a series of numbers, like what you would get during tokenization/NLP then, yes, LSTM can certainly handle that without many issues. If you are talking about a much larger range than that, then you might want to consider a scenario where you scale your inputs instead.

Answered by I_Play_With_Data on August 12, 2021

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP