TransWikia.com

Error in using sklearn's GridSearchCV on Word2Vec

Data Science Asked by Bharathi on March 7, 2021

I am using the sklearn_api of gensim to create an estimator for a Word2vec model to pass it to sklearn’s gridsearch . My code is as follows :

from gensim.sklearn_api import W2VTransformer
from sklearn.model_selection import GridSearchCV

s_obj = W2VTransformer(size=100,min_count=1,window=5)

parameters = {'size':(100,150,200),'min_count':(1,2,4),'alpha':(0.025,0.015)}

s_model = GridSearchCV(s_obj,parameters,cv=2)
s_model.fit(sentences)

print(s_model.best_params_)

Running the above code, I get the following error:

"If no scoring is specified, the estimator passed should have a 'score' method. The estimator W2VTransformer(alpha=0.025, batch_words=10000, cbow_mean=1,
               hashfxn=<built-in function hash>, hs=0, iter=5,
               max_vocab_size=None, min_alpha=0.0001, min_count=1, negative=5,
               null_word=0, sample=0.001, seed=1, sg=0, size=100,
               sorted_vocab=1, trim_rule=None, window=5, workers=3) does not."

I do not know how to resolve this. I tried using scoring='accuracy' or scoring='hamming' but they don’t seem to work either.

Can someone please help me get rid of this error?

2 Answers

Do:

from sklearn.metrics import accuracy_score, make_scorer

s_model = GridSearchCV(s_obj,parameters,cv=2, scoring=make_scorer(accuracy_score))

Answered by Noah Weber on March 7, 2021

I think you don't need all the functionality of GridSearchCV i.e. fit, K-Fold.
So you simply write a custom function to try all the different options and see which gives the best score.

First thing
You will need to define your score. It is what you are actually looking for e.g. maybe the ratio of dimensions in vector and the word count.

from gensim.sklearn_api import W2VTransformer
import itertools

def  score_func(word, vector):
    #Define what you want to measure e.g. Ratio of Vector's dim and Word count etc.
    # I am returning a constant for demonstration
    return 1.0

Then
We can simply loop on all the parm combination to get the best one

parm_dict = {'size':(100,150,200),'min_count':(1,2,4),'alpha':(0.025,0.015)}

def cust_param_search(parm_dict):
    score_best, parm_best = 0,()
    s_obj = W2VTransformer(size=100,min_count=1,window=5)
    size, min_count, alpha = [tup for k,tup in parm_dict.items()] # Individual parm tuples

    parm_combo =    list(itertools.product(size, min_count, alpha)) # Create all combinations

    for parms in parm_combo:
        s, m , a = parms

        s_obj = W2VTransformer(size=s,min_count=m,window=5, alpha = a)
        ##Get other stuff to call the score function
        word, vector = "Hello",["H","L","O"] #Dummy parameters
        score = score_func(word, vector)

        if score > score_best:
            score_best = score
            parm_best = parms
    print("Best score -",score_best, "Best parms - ",parm_best)

cust_param_search(parm_dict)

Answered by 10xAI on March 7, 2021

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP