Adding Features To Time Series Model LSTM

Question

have been reading up a bit on LSTM's and their use for time series and its been interesting but difficult at the same time. One thing I have had difficulties with understanding is the approach to adding additional features to what is already a list of time series features. Assuming you have your dataset up like this:

t-3,t-2,t-1,Output

Now lets say you know you have a feature that does affect the output but its not necessarily a time series feature, lets say its the weather outside. Is this something you can just add and the LSTM will be able to distinguish what is the time series aspect and what isnt?

Adam Sypniewski · Answer

For RNNs (e.g., LSTMs and GRUs), the layer input is a list of timesteps, and each timestep is a feature tensor. That means that you could have a input tensor like this (in Pythonic notation):

# Input tensor to RNN
[
    # Timestep 1
    [ temperature_in_paris, value_of_nasdaq, unemployment_rate ],
    # Timestep 2
    [ temperature_in_paris, value_of_nasdaq, unemployment_rate ],
    # Timestep 3
    [ temperature_in_paris, value_of_nasdaq, unemployment_rate ],
    ...
]

So absolutely, you can have multiple features at each timestep. In my mind, weather is a time series feature: where I live, it happens to be a function of time. So it would be quite reasonable to encode weather information as one of your features in each timestep (with an appropriate encoding, like cloudy=0, sunny=1, etc.).

If you have non-time-series data, then it doesn't really make sense to pass it through the LSTM, though. Maybe the LSTM will work anyway, but even if it does, it will probably come at the cost of higher loss / lower accuracy per training time.

Alternatively, you can introduce this sort of "extra" information into your model outside of the LSTM by means of additional layers. You might have a data flow like this:

TIME_SERIES_INPUT ------> LSTM -------
                                       *---> MERGE ---> [more processing]
AUXILIARY_INPUTS --> [do something] --/

So you would merge your auxiliary inputs into the LSTM outputs, and continue your network from there. Now your model is simply multi-input.

For example, let's say that in your particular application, you only keep the last output of the LSTM output sequence. Let's say that it is a vector of length 10. You auxiliary input might be your encoded weather (so a scalar). Your merge layer could simply append the auxiliary weather information onto the end of the LSTM output vector to produce a single vector of length 11. But you don't need to just keep the last LSTM output timestep: if the LSTM outputted 100 timesteps, each with a 10-vector of features, you could still tack on your auxiliary weather information, resulting in 100 timesteps, each consisting of a vector of 11 datapoints.

The Keras documentation on its functional API has a good overview of this.

In other cases, as @horaceT points out, you may want to condition the LSTM on non-temporal data. For example, predict the weather tomorrow, given location. In this case, here are three suggestions, each with positive/negatives:

Have the first timestep contain your conditioning data, since it will effectively "set" the internal/hidden state of your RNN. Frankly, I would not do this, for a bunch of reasons: your conditioning data needs to be the same shape as the rest of your features, makes it harder to create stateful RNNs (in terms of being really careful to track how you feed data into the network), the network may "forget" the conditioning data with enough time (e.g., long training sequences, or long prediction sequences), etc.
Include the data as part of the temporal data itself. So each feature vector at a particular timestep includes "mostly" time-series data, but then has the conditioning data appended to the end of each feature vector. Will the network learn to recognize this? Probably, but even then, you are creating a harder learning task by polluting the sequence data with non-sequential information. So I would also discourage this.
Probably the best approach would be to directly affect the hidden state of the RNN at time zero. This is the approach taken by Karpathy and Fei-Fei and by Vinyals et al. This is how it works:

For each training sample, take your condition variables $vec{x}$.
Transform/reshape your condition variables with an affine transformation to get it into the right shape as the internal state of the RNN: $vec{v} = mathbf{W} vec{x} + vec{b}$ (these $mathbf{W}$ and $vec{b}$ are trainable weights). You can obtain it with a Dense layer in keras.
For the very first timestep, add $vec{v}$ to the hidden state of the RNN when calculating its value.

This approach is the most "theoretically" correct, since it properly conditions the RNN on your non-temporal inputs, naturally solves the shape problem, and also avoids polluting your inputs timesteps with additional, non-temporal information. The downside is that this approach often requires graph-level control of your architecture, so if you are using a higher-level abstraction like Keras, you will find it hard to implement unless you add your own layer type.

user2614596 · Answer

There is a function in keras LSTM reset_states(states).

However the parameter states is the concatination of two states, hidden state h and cell state.

States = [h, c]

it would be interesting to know if you should initialize h or c according to the approaches in the above mentioned papers.

double-d · Answer

This is probably not the most efficient way, but the static variables could be repeated to timeseries length using tf.tile().

Philippe Remy · Answer

Based on all the good answers of this thread, I wrote a library to condition on auxiliary inputs. It abstracts all the complexity and has been designed to be as user-friendly as possible:
https://github.com/philipperemy/cond_rnn/ (tensorflow)
Hope it helps!

shivam13juna · Answer

Adam's answer does seem to make the most sense, however, I am not sure about the second statement "Polluting sequential data with non-sequential information".

So recently I trained a character-level LSTM model, in which I just appended a non-sequential feature in the end of the sequential features. The model learned how to differentiate that pretty well.

The question if the model will perform better had I done it Adam's way, is still to be tested. But for people who don't want to go the extra mile, appending non-sequential features to sequential ones works just fine.

Adding Features To Time Series Model LSTM

5 Answers

Add your own answers!

Ask a Question