TransWikia.com

What is the difference between "greedy selection" and "sampling according to a distribution?"

Cross Validated Asked on November 2, 2021

I’m currently studying language generation and had a question regarding some concepts. The paper I’m reading states that they formulate the task of next-token generation as conditionally generating tokens one-by-one "either by greedily selecting the most probable one, or by sampling from the next word distribution."

What’s the difference between those two concepts? "Greedy selection" isn’t hard to understand as I’m assuming that it’s talking about simply selecting the most probably token according to an argmax function, but how is this different from sampling according to a distribution?

If we have a distribution, then I’m also assuming that we have the distribution function and that we’re sampling according to that function. Wouldn’t this essentially be the same thing as "selecting the most probable one?"

Thanks.

One Answer

If you have a biased coin with probabilities $P(H)=0.6$ and $P(T)=0.4$, selecting the most probable one would be returning $H$ every time. Sampling from this distribution would return $H$ 60% of the times and $T$ 40% of the times.

When doing next word prediction, selecting the most probable word all the time might result in getting sequences that are not interesting, for example "Which came first, chicken or egg? I don't know. Why don't you know? I don't know. Why don't you know? I don't know." and so on. Sampling will create more diverse (albeit potentially of lesser quality) sequences.

Answered by David on November 2, 2021

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP