AnswerBun.com

Is smoothing in NLP ngrams done on test data or train data?

Data Science Asked by Hing Wong on November 30, 2020

Is smoothing in NLP ngram done on test data or train data?

Since smoothing is to avoid the language model predicting 0 probability of unseen corpus (test). So I wonder is smoothing done on test data only? Or on train data only? Or both? I don’t seem to find an answer to this yet.

One Answer

Is smoothing in NLP ngram done on test data or train data?

In short: both.

Smoothing consists in slightly modifying the estimated probability of an n-gram, so the calculation (for instance add-one smoothing) must be done at the training stage since that's when the probabilities of the model are estimated.

But smoothing usually also involves differences at the testing stage, in particular for assigning a probability to unknown n-grams instead of 0.

Answered by Erwan on November 30, 2020

Add your own answers!

Related Questions

Training CNN on a huge data set

1  Asked on September 5, 2021 by omar-rayyan

       

Running H2O in databricks

1  Asked on September 5, 2021 by physics_2015

   

Extracting keywords from pdf file with python

1  Asked on September 5, 2021 by mr-scientist

   

Ask a Question

Get help from others!

© 2022 AnswerBun.com. All rights reserved. Sites we Love: PCI Database, MenuIva, UKBizDB, Menu Kuliner, Sharing RPP, SolveDir