Skip Gram Negative Sampling with Logistics Regression

Question

Given a training sentences as follow form document:
... lemon, a tablespoon of apricot jam a pinch ...

Word apricot choose to be target word as t with windows size 2

Training sample with both negative and positive samples looks like as follow
Positive samples: 
apricot tablespoon
apricot of
apricot preserves
apricot or
Negative samples: (1 positive sample with 2 corresponding negative samples)
apricot aardvark  apricot twelve
apricot puddle  apricot hello
apricot where  apricot dear
apricot coaxial  apricot forever
The likelihood function (single word):
$$logfrac{1}{1+e^{-ccdot t}}+sum_{i=1}^k logfrac{1}{1+e^{n_icdot t}}$$
1.K is 2 since we have 2 negative sample for each positive sample
2.t is a vector of words apricot 
3.c is a vector of words within the windows size such as apricot tablespoon in positive sample
4.$n_i$ is a vector of words in negative sample for each positive sample
Questions:
Here 's my questions:
1. How to fit the negative and positive sample into vector $c$ vector $n_i$ and vector $t$?In deep learning version it is one hot encoding
but how about in this version?
2. Any workable example with small datasets?
3. How do I know my training result of vector t is correct? Since I prefer to study this method with only a very small datasets and this method need a lot of training sample and training time like a weekHowever my aim is to learn about this method not for the word embedding
It will be great for anyone offer help to my question I am not only asking for help but also share what I learn

Skip Gram Negative Sampling with Logistics Regression

Add your own answers!

Ask a Question