Predicting the missing word using fasttext pretrained word embedding models (CBOW vs skipgram)

Question

I am trying to implement a simple word prediction algorithm for filling a gap in a sentence by choosing from several options:

Driving a ---- is not fun in London streets.

Apple
Car
Book
King

With the right model in place:

Question 1. What operation/function has to be used to find the best fitting choice? The similarity functions in the library are defined between one word to another word and not one word to a list of words (e.g. most_similar_to_given function). I don't find this primitive function anywhere while it is the main operation promised by CBOW (see below)! I see some suggestions here that are not intuitive! What am I missing here?

I decided to follow the head first approach and start with fastText which provides the library and pre-trained datasets but soon got stuck in the documentation:

fastText provides two models for computing word representations: skipgram and cbow ('continuous-bag-of-words'). The skipgram model
  learns to predict a target word thanks to a nearby word. On the other
  hand, the cbow model predicts the target word according to its
  context. The context is represented as a bag of the words contained in
  a fixed size window around the target word.

The explanation is not clear for me since the "nearby word" has a similar meaning as "context". I googled a bit and ended up with this alternative definition:

In the CBOW model, the distributed representations of context (or surrounding words) are combined to predict the word in the middle.
  While in the Skip-gram model, the distributed representation of the
  input word is used to predict the context.

With this definition, CBOW is the right model that I have to use. Now I have the following questions:

Question 2. Which model is used to train fastText pre-trained word vectors? CBOW or skipgram?

Question 3. Knowing that the right model that has to be used is CBOW, can I use the pre-trained vectors trained by skipgram model for my word prediction use case?

Valentin Calomme · Accepted Answer

Question 1:
To do so, I would use the Gensim wrapper of FastText because Gensim has a predict_output_word which does exactly what you want. Given a list of context words, it provides the most fitting words.
Question 2:
It is up to the user. FastText isn't inherently CBOW or Skipgram. See this
Question 3:
Yes, even though CBOW and SkipGram are different training procedures, they share a common goal. Both will generate word embeddings where (hopefully) words that are semantically close also have embeddings that are close. The main difference between SkipGram and CBOW is the inherent heuristic used for semantic closeness.

Predicting the missing word using fasttext pretrained word embedding models (CBOW vs skipgram)

One Answer

Add your own answers!

Ask a Question