Is smoothing in NLP ngrams done on test data or train data?

Question

Is smoothing in NLP ngram done on test data or train data?
Since smoothing is to avoid the language model predicting 0 probability of unseen corpus (test). So I wonder is smoothing done on test data only? Or on train data only? Or both? I don't seem to find an answer to this yet.

Erwan · Answer

Is smoothing in NLP ngram done on test data or train data?

In short: both.
Smoothing consists in slightly modifying the estimated probability of an n-gram, so the calculation (for instance add-one smoothing) must be done at the training stage since that's when the probabilities of the model are estimated.
But smoothing usually also involves differences at the testing stage, in particular for assigning a probability to unknown n-grams instead of 0.

Is smoothing in NLP ngrams done on test data or train data?

One Answer

Add your own answers!

Ask a Question