TransWikia.com

Skip Gram Negative Sampling with Logistics Regression

Data Science Asked by Linear Algebra fans on September 4, 2021

Given a training sentences as follow form document:
… lemon, a tablespoon of apricot jam a pinch …

Word apricot choose to be target word as t with windows size 2

Training sample with both negative and positive samples looks like as follow

Positive samples:
apricot tablespoon
apricot of
apricot preserves
apricot or

Negative samples: (1 positive sample with 2 corresponding negative samples)
apricot aardvark  apricot twelve
apricot puddle  apricot hello
apricot where  apricot dear
apricot coaxial  apricot forever

The likelihood function (single word):
$$logfrac{1}{1+e^{-ccdot t}}+sum_{i=1}^k logfrac{1}{1+e^{n_icdot t}}$$
1.K is 2 since we have 2 negative sample for each positive sample
2.t is a vector of words apricot
3.c is a vector of words within the windows size such as apricot tablespoon in positive sample
4.$n_i$ is a vector of words in negative sample for each positive sample

Questions:
Here ‘s my questions:
1. How to fit the negative and positive sample into vector $c$ vector $n_i$ and vector $t$?
In deep learning version it is one hot encoding
but how about in this version?
2. Any workable example with small datasets?
3. How do I know my training result of vector t is correct?
Since I prefer to study this method with only a very small datasets and this method need a lot of training sample and training time like a week
However my aim is to learn about this method not for the word embedding

It will be great for anyone offer help to my question I am not only asking for help but also share what I learn

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP