TransWikia.com

What is learnt by a model having multiple LSTM layers, each getting a slice of the same input data?

Data Science Asked by snowflakebladerunner on June 20, 2021

I came across the following architecture: –

Generation of training and testing data

# Some pre-processing of ASCII data

# Essentially, given 512 previous letters (X) predict the next letter (Y)
X = [...] # shape = (100,000 * 512)
Y = [...] # shape = (100,000 * 1) 

Model

sequence_length0 = 512
sequence_length1 = 64

# first part of model
input_0 = Input(batch_shape=(batch_size, sequence_length0))
embedding_output_0 = Embedding(..., batch_input_shape=(batch_size, sequence_length0))(inputs_0)
lstm_0 = Bidirectional(LSTM(recurrent_units=32,return_sequences=True,return_state=True))(embedding_output_0)

# secondpart of model
input_1 = Input(batch_shape=(batch_size, sequence_length1))
embedding_output_1 = Embedding(..., batch_input_shape=(batch_size, sequence_length1))(inputs_1)
lstm_1 = Bidirectional(LSTM(recurrent_units=32,return_sequences=True,return_state=True))(embedding_output_1)

lstm_cat = Concatenate()([lstm_0, lstm_1])

# some fully connected network

output = Dense(..., activation='softmax')(...)

# model is constructed the following way. Notice that inputs is an array i.e inputs[0] will go to 
# first part of model (lstm_0) and inputs[1] will go to other (lstm_1).
model = Model(inputs=[input_0, input_1], outputs=output)

Training loop

for batch_index in range(...):
  # Take a chunk of X equal to batch_size and put in X1
  X1 = X[start_position: start_position + batch_size] # shape = (batch_size, 512)

  # Take a chunk of X1 (take all rows and take 64 columns from right) and put it in X2
  X2 = X1[:,-64:] # shape = (batch_size, 64)

  model.train_on_batch([X1,X2], Y)

I was wondering what kind of information is learnt by this model? Like for example, the first part of the model might specialize in long term dependencies while other in short term (I may be wrong, just my intuition). Is the second LSTM layer worthless as it has a slice of the same input which the first part has?

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP