TransWikia.com

Can RNN be replaced with non-recurrent classifier for Sequence Classification problem?

Data Science Asked by Deil on July 16, 2021

Setup:
We have sequence of events that are not evenly spaced (not a time series). Length of the sequence is constant.

Goal:
Predict class of the event that is most probable to follow this sequence.

Background:
I know that RNN would probably a good fit for this task, but at the same time I wonder whether parameters sharing in our U,W,V matrices actually hurt accuracy ( even though training process is cheaper). Let’s say we are ok to spent more time(and data) for training and don’t want to compromise accuracy.

Question:
Is it true that by using regular MLP we can achieve better or at least same performance if we just combine/flatten all features from those sequence events and pass them alltogether as an input? I believe model should still be able to learn interactions between features(that represent different events in a sequence) but not sure how good it will be at it and if not then why?

One Answer

Conceptually, the sound of dropping a metal chain on the floor is different from the sound of dropping the separated links of that chain.

Feedforward NN

In a feed forward neural network all of sequential features would be consumed independently:

$f(x)=WX+b=w_1x_1+..+w_nx_n+b$

This is all good, as far as you’re ready to sacrifice the step-wise dependency between x1, x2 etc.

Recurrent NN

To be able to utilise the temporal or sequential signal in your dataset you need a method that “chains” each feature with its past/future state, right?

We "connect" these sequential events with the alleged "hidden state":

$a_n = f(W_n, a_{n-1}, x_n)$

By exploding the above equation you can see how past information is accumulated in the subspace of the hidden state:

$a_n = f(W_n, a_{n-1}, x_n) = f(W_n, f(W_{n-1}, a_{n-2}, x_{n-1}), x_n)$, since $ a_{n-1}=f(W_n, a_{n-2}, x_n)$.

You may be able to see some issues with the above especially when it comes to sequences of larger length, usually tackled with LSTM and attention pooling architectures, but this is a different discussion!

Hope it helps.

Answered by hH1sG0n3 on July 16, 2021

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP