TransWikia.com

Is smoothing in NLP ngrams done on test data or train data?

Data Science Asked by Hing Wong on November 30, 2020

Is smoothing in NLP ngram done on test data or train data?

Since smoothing is to avoid the language model predicting 0 probability of unseen corpus (test). So I wonder is smoothing done on test data only? Or on train data only? Or both? I don’t seem to find an answer to this yet.

One Answer

Is smoothing in NLP ngram done on test data or train data?

In short: both.

Smoothing consists in slightly modifying the estimated probability of an n-gram, so the calculation (for instance add-one smoothing) must be done at the training stage since that's when the probabilities of the model are estimated.

But smoothing usually also involves differences at the testing stage, in particular for assigning a probability to unknown n-grams instead of 0.

Answered by Erwan on November 30, 2020

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP