TransWikia.com

what's the motivation behind BERT masking 2 words in a sentence?

Data Science Asked by ihadanny on March 14, 2021

bert and the more recent t5 ablation study, agree that

using a denoising objective always results in better downstream task performance compared to a language model

where denoising == masked-lm == cloze.

I understand why learning to represent a word according to its bidirectional surroundings makes sense. However, I fail to understand why is it beneficial to learn to mask 2 words in the same sentence, e.g. The animal crossed the road => The [mask] crossed the [mask]. Why does it make sense to learn to represent animal without the context of road?

Note: I understand that the masking probability is 15% which corresponds to 1/7 words, which makes it pretty rare for 2 words in the same sentence to be masked, but why would it ever be beneficial, even with low probability?

Note2: please ignore the masking procedure sometimes replacing mask with a random/same word instead of [mask], T5 investigates this choice in considerable length and I suspect that it’s just an empirical finding 🙂

One Answer

Because BERT accepts the artificial assumption of independence between masked tokens, presumably because it makes the problem simpler and yet gave excellent results. This is not discussed by authors in the article or anywhere else to my knowledge.

Later works like XLNet have worked towards eliminating such an independence assumption, as well as other potential problems identified in BERT. However, despite improving on BERT's results on downstream tasks, XLNet has not gained the same level of attention and amount of derived works. In my opinion, this is because the improvement did not justify the complexity introduced by the permutation language modeling objective.

The same assumption is made by other pre-training approaches, like Electra's adversarial training. The authors argue that this assumption isn’t too bad because few tokens are actually masked, and it simplifies the approach.

Correct answer by noe on March 14, 2021

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP