Error in using sklearn's GridSearchCV on Word2Vec

Question

I am using the sklearn_api of gensim to create an estimator for a Word2vec model to pass it to sklearn's gridsearch . My code is as follows :
from gensim.sklearn_api import W2VTransformer
from sklearn.model_selection import GridSearchCV

s_obj = W2VTransformer(size=100,min_count=1,window=5)

parameters = {'size':(100,150,200),'min_count':(1,2,4),'alpha':(0.025,0.015)}

s_model = GridSearchCV(s_obj,parameters,cv=2)
s_model.fit(sentences)

print(s_model.best_params_)

Running the above code, I get the following error:
"If no scoring is specified, the estimator passed should have a 'score' method. The estimator W2VTransformer(alpha=0.025, batch_words=10000, cbow_mean=1,
               hashfxn=<built-in function hash>, hs=0, iter=5,
               max_vocab_size=None, min_alpha=0.0001, min_count=1, negative=5,
               null_word=0, sample=0.001, seed=1, sg=0, size=100,
               sorted_vocab=1, trim_rule=None, window=5, workers=3) does not."

I do not know how to resolve this. I tried using scoring='accuracy' or scoring='hamming' but they don't seem to work either.
Can someone please help me get rid of this error?

Noah Weber · Answer

Do:
from sklearn.metrics import accuracy_score, make_scorer

s_model = GridSearchCV(s_obj,parameters,cv=2, scoring=make_scorer(accuracy_score))

10xAI · Answer

I think you don't need all the functionality of GridSearchCV i.e. fit, K-Fold.
So you simply write a custom function to try all the different options and see which gives the best score.
First thing
 You will need to define your score. It is what you are actually looking for
e.g. maybe the ratio of dimensions in vector and the word count.
from gensim.sklearn_api import W2VTransformer
import itertools

def  score_func(word, vector):
    #Define what you want to measure e.g. Ratio of Vector's dim and Word count etc.
    # I am returning a constant for demonstration
    return 1.0

Then
 We can simply loop on all the parm combination to get the best one
parm_dict = {'size':(100,150,200),'min_count':(1,2,4),'alpha':(0.025,0.015)}

def cust_param_search(parm_dict):
    score_best, parm_best = 0,()
    s_obj = W2VTransformer(size=100,min_count=1,window=5)
    size, min_count, alpha = [tup for k,tup in parm_dict.items()] # Individual parm tuples

parm_combo =    list(itertools.product(size, min_count, alpha)) # Create all combinations

for parms in parm_combo:
        s, m , a = parms

s_obj = W2VTransformer(size=s,min_count=m,window=5, alpha = a)
        ##Get other stuff to call the score function
        word, vector = "Hello",["H","L","O"] #Dummy parameters
        score = score_func(word, vector)

if score > score_best:
            score_best = score
            parm_best = parms
    print("Best score -",score_best, "Best parms - ",parm_best)

cust_param_search(parm_dict)

Error in using sklearn's GridSearchCV on Word2Vec

2 Answers

Add your own answers!

Ask a Question