Mapping sequences of different lengths to fixed vector - Python

Question

I am trying to make a chatbot using a deep neural network in python using keras. The problem I am having is that for the deep neural network to work, the input dimension has to be fixed. So my question is, how can I map a sequence of words (sentence of different lengths) to a fixed vector so I am able to feed this through a deep neural network?

horaceT · Accepted Answer

A recurrent neural network (RNN) is probably what you want to look into. It has exactly that capability of dealing with variable length input sequence. In fact, currently some of the most successful applications of RNN are in the NLP area, where the inputs are sentences of varying size. Just google recurrent neural network LSTM and you will find lots of tutorials.

Correct answer by horaceT on April 6, 2021

Trideep Rath · Answer

There is a lot of research going on in this field of generating fixed length sentence vectors. Here are some of the ways.

Train/Download vectors.
Add the word_vectors in the sentence to get the fixed length sentence vector. Surprisingly it gives decent results.
If you have a huge corpus. One popular algorithm and used for various baselines is "Quoc Le & Tomáš Mikolov: “Distributed Representations of Sentences and Documents”. The gensim implementation can be found here.
You may also download pretrained sentence vectors. Here are some of the sources.

Mapping sequences of different lengths to fixed vector - Python

2 Answers

Add your own answers!

Ask a Question