TransWikia.com

Naive Bayes and Support Vector Machine (NBSVM) Classification

Data Science Asked by OldTimeRambler on February 9, 2021

I am relatively new to datascience and have a question about NBSVM. I have a two class problem and text data (headlines from the newspaper). I want to use NBSVM to predict whether a headline has the label 0 or 1.

How I understood it, how I have to proceed now:

  1. convert the headlines to a document term matrix
  2. calculate the log-count ratio. As I understood it, these are the probabilities of the individual documents for a class (i.e. the probability that a document is in class 0 or class 1). Please correct me if I’m wrong here.
  3. the log-count ratios then serve as input for the SVM. It inserts the ratios and sets the boundary between the two classes. When new data comes, the SVM tells you to which class the data belongs.

Is this right? Please note that this is only a theoretical procedure, not an implementation.

One Answer

you use sklearn "CountVectorizer" and "TfidfVectorizer" to covert the text data into vector

    tfidf = TfidfVectorizer(sublinear_tf=True, min_df=5, norm='l2', encoding='latin-1', ngram_range=(1, 2), stop_words='english')
X_train, X_test, y_train, y_test = train_test_split(df['text'], df['class'], random_state = 0)

# vector representations of the text 
count_vect = CountVectorizer()
X_train_counts = count_vect.fit_transform(X_train)
tfidf_transformer = TfidfTransformer()
X_train_tfidf = tfidf_transformer.fit_transform(X_train_counts)

# Building a SVM model
svmmodel = LinearSVC().fit(X_train_tfidf, y_train)

Answered by Harish Kumar on February 9, 2021

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP