What classifier could predict spam/ham labels for SMS messages better than Naive Bayes?

Question

I have 7000 SMS messages, 6000 ham, 1000 spam. Typical messages are:
Ham: Yo, any way we could pick something up tonight?
Spam: Great News! Call FREEFONE 08006344447 to claim your guaranteed £1000 CASH or £2000 gift.

I want to implement a supervised classifier that would predict the ham/spam label given a new SMS.
The two classifiers have I tried are as follows:

Simple-predictor, where I count how many elements in the following keywords
 [
    "!", "click", "visit", "reply", "subscribe", "free", "price", "offer",
    "claim code", "charge", "stop", "unlimited", "expires", "£",
    "new voicemail", "cash prize", "special-call"
 ]

are substrings of the (decapitalized) SMS message and predict spam if the count is greater than 1, ham otherwise. The method achieves
accuracy (correct guesses ratio): 0.9742822966507177
sensitivity (correct spam guesses ratio): 0.8452380952380952

Bayes (monograms) predictor, where I split the SMS into a tokens list $L = [t_1, t_2, ..., t_n]$ (e.g. for the ham message above $L$ would be ['yo', 'any', 'way', ..., 'tonight']), compare the quantities :

$s = P(spam) cdot P(t_1 | spam) cdot ldots cdot P(t_n | spam)$,

$h = P(ham) cdot P(t_1 | ham) cdot ldots cdot P(t_n | ham)$,

and predict spam if $s > h$, ham otherwise.
$P(spam), P(ham), P(token | spam), P(token | ham)$ are estimated from the training data.
This method achieves
  accuracy: 0.9881889763779528
  sensitivity: 0.9312977099236641

when trained on 4000 messages and tested on the other 3000 messages

What new idea could I try to obtain a classifier with better prediction scores?
Note that I have already tried 'tuning' both Simple-predictor (e.g. trying different keywords list, changing the count threshold, etc.) and Bayes predictor (e.g. performance of bigrams predictor is worse due to a limited training set size) to achieve these scores. Now I am looking for a new idea.

Jindřich · Answer

Basically any method text classification can be applied here. If you want to stick with classical MT methods, you can try:

Different model (logistic regression, SVM),

Feature engineering (e.g., replacing all phone numbers with a special token, removing stop words, in case of discriminative models, you can weight the input with TF-IDF scores, including n-gram features),

Word embeddings (such as GloVe or FastText) as an input of a discriminative model.

If you do not care about the inference time, you can try some neural models. 7k messages should be enough to train a small LSTM classifier and definitely enough to fine-tune BERT or RoBERTa.

What classifier could predict spam/ham labels for SMS messages better than Naive Bayes?

One Answer

Add your own answers!

Ask a Question