Understanding Transfer Learning of Word Embeddings

Question

I can't quite visualize how transfer learning of pre-trained word embeddings is useful in an NLP task( say named entity recognition ) . I'm studying Andrew NG's Sequence Models course and he seems to say if the training set for the target task is very less, then transfer learning of word embeddings is helpful in a way that unknown words in the training set can be handled in the application .
Let's consider the task of Named Entity recognition ,
My question is , what does the very small training set for the target task contain ? Are they word embeddings or sentences labeled with entities ?
Does he seem to suggest that if the training set is of just  labeled sentences whose words have embeddings in the pre-trained model , then words which are not present in the training set but are closer to those already in the training set also get captured effectively in the application ?
Eg : Consider 'Orange' is in training set . But , 'Apple' is not .
So , in the sentences , ' I like Orange Juice ' and ' I like Apple Juice ' , Apple gets recognized as a fruit , even though it's not in the training set since it is closer to Orange .
Am I right in my assumption ? Or can someone please correct and explain to me if I am not ?

Nischal Hp · Answer

So Named Entity Recognition is a mechanism where you ask your network to learn about how to detect entities given word vectors as the input.
The theoretical aspect of word embeddings is that based on your construction of sentences, the word embeddings for Orange and Apple are very similar i.e their cosine angle is very small.
In Named entity recognition you use these word embeddings and feed them into a network where the data you are training one has tags for each of the word embeddings i.e entities or normal words. So your network is actually understanding the relationship of the word embeddings and how to tag them. This makes it incredible for us to see that Apple gets detected even though its not in the training set, precisely where word embeddings help us quite well because the word embeddings are usually trained on a large corpus of data containing the words apple, orange and other tokens. This is where transfer learning helps because you are using the word embeddings trained in an unsupervised manner and then used to learn about entities.
Hope that helps. I can elaborate if required.

Understanding Transfer Learning of Word Embeddings

One Answer

Add your own answers!

Ask a Question