TransWikia.com

Passing TFIDF Feature Vector to a SGDClassifier from sklearn

Data Science Asked by Pranay Mathur on March 28, 2021

import numpy as np
from sklearn import linear_model

X = np.array([[-1, -1], [-2, -1], [1, 1], [2, 1]])
Y = np.array(['C++', 'C#', 'java','python'])
clf = linear_model.SGDClassifier()
clf.fit(X, Y)
print (clf.predict([[1.7, 0.7]]))
#python

I am trying to predict the values from arrays Y by giving a test case and training it on a training data which is X, Now my problem is that, I want to change the training set X to TF-IDF Feature Vectors, so how can that be possible?
Vaguely, I want to do something like this:

import numpy as np
from sklearn import linear_model

X = np.array_str([['abcd', 'efgh'], ['qwert', 'yuiop'], ['xyz','abc'],  ['opi', 'iop']])
Y = np.array(['C++', 'C#', 'java','python'])
clf = linear_model.SGDClassifier()
clf.fit(X, Y)

2 Answers

It's useful to do this with a Pipeline:

import numpy as np
from sklearn import linear_model, pipeline, feature_extraction

X = np.array([[-1, -1], [-2, -1], [1, 1], [2, 1]])
Y = np.array(['C++', 'C#', 'java','python'])
clf = pipeline.make_pipeline(
        feature_extraction.text.TfidfTransformer(use_idf=True),
        linear_model.SGDClassifier())
clf.fit(X, Y)
print(clf.predict([[1.7, 0.7]]))

Correct answer by Andreas on March 28, 2021

You can find a nice tutorial how to achieve that on this blog

This is acctually a part II. In part I the author discusses how and what is "Term frequency".

Link to part I.

Answered by MaticDiba on March 28, 2021

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP