TransWikia.com

What is the optimal algorithm for playing the hangman word game?

Computer Science Asked on December 26, 2021

Suppose we are playing the game hangman. My opponent and I both have access to the dictionary during the game. My opponent picks a word from the dictionary with knowledge of the algorithm which I will use to guess his secret word. Once my opponent has picked a word, I am given only the length of the word. I can guess a letter which I think will be in the word, and if it is in the word my opponent identifies all of the positions of that letter in that word. If I guess a letter which isn’t in the word, that’s counted as one mistake. If I can guess the word before too many mistakes I win.

My opponent’s goal should be to pick a word which maximizes the number of mistakes (incorrect guesses) I make before I can guess the word. My goal is to minimize them. (Traditionally, if the number of mistakes is > some number then I lose the game, but we can relax that constraint.)

Consider three algorithms for letter guessing. These are the main ones I’ve thought of, but there are probably more, and I welcome alternate ideas. In all three algorithms, the first step will be to gather the list of words which are the same length as the secret word. Then, for each letter in the alphabet, count the number of words in the dictionary which contain that letter. For example, maybe 5000 contain "a", 300 contain "b", and so on. Here it is in python:

    alphabet = list('abcdefghijklmnopqrstuvwxyz')

    while not found:
        probabilities = dict.fromkeys(alphabet, 0)

        for word in dictionary:
            for letter in word:
                if letter in alphabet:
                    probabilities[letter] += 1

        # now we have the letter frequencies

After that is where the three algorithms diverge.

  1. In the first algorithm, we guess the letter which the most number of remaining words contain. So if 5000 remaining words contain "a" and no other letters are in that many words, we will pick "a" every time. If "a" is correct, this gives us positional information which we can filter the list further. For example, we might filter the list by all words that match ".a..". (Dots are unknown letters.) If "a" is incorrect, we filter out all words which contain "a". In the case where there is a tie and two letters are found in an equal number of words, letters are chosen alphabetically. So if we know the word matches ".ays", we’ll just guess words in alphabetical order.

  2. This is only slightly different from the first algorithm. Instead of always choosing alphabetical ordering, in the case of a tie we choose letters randomly. This has the benefit that our opponent doesn’t know what we will pick. In the first strategy, "rays" is always better than "days". This avoids that issue.

  3. In this case, we pick letters with a probability proportional to the number of words which contain that letter. At the beginning when we tallied the number of words which contain "a" and the number which contain "b" and so on, since "a" happened to be found in the most number of words, in strategies 1 and 2 we picked "a" 100% of the time. Instead, we will still choose "a" a plurality of the time, but a small number of times we’ll pick "z" even though a might be found in 10x more words. I have my doubts about this strategy being optimal but it was used in research in 2010.

So I have two questions:

  1. What is the optimal letter-guessing strategy which I should use assuming that my opponent knows I will use this strategy? In this we want to minimize the average number of guesses it will take to guess the secret word.
  2. For a given word, say "pays", what is the average number of mistakes M I should be expected to make? Is there a closed-form way to calculate M, as opposed to running this simulation many times and averaging the results?

Clarifications

  • Any English dictionary can be used. For example, this English dictionary with 84k words.. Subsets of dictionaries carefully chosen for ambiguity could also be interesting but are outside of the scope of this question.
  • There is no constraint on the word size as long as the word is in the dictionary. The guesser will know the word size before he begins guessing.
  • Only the total number of mistakes matters, which is independent of the word size. You don’t get more chances for longer words, or fewer chances for shorter ones.
  • I’m only interested in minimizing the average number of mistakes. If there are two optimal algorithms which have an equivalently small average number of mistakes, both are acceptable.
  • In principle it is possible for there to be a situation where choosing one letter (say "b") leads to more information than another (say "a") even though "a" occurs in more possible words than "b" does. I am open to this possibility in practice as well. The three algorithms illustrated above assume that this does not happen due to the fact that a correct guess tends to yield more information than an incorrect one (i.e. positional information about a correct letter is usually better than "this letter is not in the word"). But an optimal algorithm need not make this assumption.

One Answer

It is possible to compute the optimal strategy, but the computation might be fairly intensive, and I'm not sure whether it will give you much of a gain over simple heuristics. I'll explain how below. The main idea is to use dynamic programming.

Deterministic strategies

Let me start first with the special case of deterministic strategies for choosing which letter to guess next, as they are easiest to analyze. This will give us a warmup before we move on to randomized strategies (which will probably be superior).

The state of the game at any point can be summarized by the pair $(g,r)$, where $g$ is the set of letters that have been guessed so far, and $r$ is the responses (i.e., the sequence of blanks and letters from $g$ that is visible to the player). The order of past guesses doesn't matter (which is why it suffices to have a set $g$ of past guesses). We'll say that a word $w$ is consistent with $(g,r)$ if it remains possible, i.e., if the opponent's word is $w$ and you make the guesses in $g$, then you'd get the response $r$.

Let $p(g,r)=1$ if it is possible to win from here, if starting from the state $(g,r)$. That means that there exists a strategy to win: where no matter which word the opponent is thinking of (as long as it is consistent with $(g,r)$), the number of wrong guesses you've made so far, plus the number you make in the future with this strategy, won't exceed the upper limit. Otherwise, define $p(g,r)=0$.

Now you can compute $p(g,r)$ with dynamic programming, using the recurrence relation

$$p(g,r) = bigvee_a bigwedge_{(g',r')} p(g',r'),$$

where $a$ ranges over all letters not in $g$ (i.e., all possibilities for which letter to guess next), and $(g',r')$ ranges over all possible responses if you guess $a$ next (i.e., $g'=gcup {a}$, and we range over all words $w$ that are consistent with $(g,r)$ and compute the response $r'$ to guesses $g'$ if the word is $w$).

In particular, I suggest you compute $p(g,r)$ using depth-first search and memoization. Then, $p(emptyset, - - cdots -)$ will tell you whether it is possible to win within the upper limit, no matter what word the opponent chooses. By tracing the computation backward, you can reconstruct the optimal strategy.

Is this feasible? I expect it is. I think the number of possible states $(g,r)$ is not too large. (In particular, you can ignore states $(g,r)$ where you've already made too many incorrect guesses, and any state where only one word is consistent with that state is automatically a win, so it doesn't need further analysis.) As an optimization, given a state $(g,r)$, you can try enumerating all words $w$ that are consistent with them and simulate some fixed heuristic and check whether it will win for each word; if so, you don't need to do any further depth-first traversal and you can immediately mark $p(g,r)=1$. Also, you only need to consider guesses $a$ that appear in at least one word that is consistent with $(g,r)$.

So it should be pretty straightforward to compute the optimal deterministic strategy.

Randomized strategies

We can follow a similar method to handle randomized strategies, but it gets a little more involved. Now let $p(g,r)$ denote the probability of winning from here, if we use the optimal strategies from here on. We can again compute this using dynamic programming.

The key difference is in the recurrence relation where we compute $p(g,r)$ from the quantities $p(g',r')$ where $(g',r')$ range over all states that could occur next after guessing one more letter. Here we have a simple two-player game. First we choose a probability distribution over the letters not in $g$. Then the opponent sees our distribution, and chooses a distribution on words $w$ that are consistent with $(g,r)$. Our payoff (and the amount the opponent loses) is equal to the probability that we win from here on given those choices. Notice that this can be computed from the $p(g',r')$ values. It turns out that since we go first and the opponent sees our choice, the opponent doesn't need a randomized strategy; they can do equally well with a deterministic strategy, i.e., by picking a single word $w$ based on our distribution. Then if we form a matrix $M$ where $M_{w,i}$ holds the probability of winning if we choose letter $i$ next and the opponent chooses word $w$; we can fill in $M_{w,i}$ as $p(g',r')$ where $g'=g cup {i}$ and $r'$ is the response to guess $g'$ if the word is $w$. Then, we seek to solve the optimization problem $$max_v min_w (Mv)_w = -min_v max_w -(Mv)_w = -min_v |-Mv|_infty.$$ This can be solved using linear programming. So, we can compute $p(g,r)$ using linear programming from the values $p(g',r')$ where $g'$ is one bigger than $g$.

Putting this all together, we'll use depth-first traversal with memoization (or dynamic programming), using linear programming at each step to solve the 2-player game. This gives us the optimal randomized strategy to play hangman.

The resulting computation might be very expensive, as it requires billions or trillions of steps, where you solve a linear program in each step. So, I don't know whether it's realistic to use it in practice.

Some optimizations are possible, as suggested by @j_random_hacker. You can first compute $p(g,r)$ for deterministic strategies; then you only need to consider randomized strategies for states $(g,r)$ where $p(g,r)=0$.

Heuristic strategies

You listed some reasonable heuristics for choosing a strategy. Here is one more heuristic. Given the state $(g,r)$, enumerate all possibilities for the next guess $a notin g$. Let's focus on a single letter $a$. Then, for each $(g',r')$ that could occur as a next state after guessing $a$ (so $g'=gcup {a}$), count the number of words $w$ consistent with $(g',r')$; take the maximum over these numbers, and use that as the count associated with $a$. The heuristic strategy is to choose the letter $a$ whose count is smallest.

Computing the expected number of mistakes

You can compute the expected number of mistakes of a particular randomized strategy using dynamic programming. The details are similar to above, but the recurrence relation becomes simpler: it becomes

$$p(g,r) = min_w mathbb{E}[p(g',r')],$$

where $w$ ranges over all words consistent with $(g,r)$, and $(g',r')$ is the distribution on the next state if the word is $w$ and the next letter guessed is chosen by your strategy. Given your strategy and the word $w$, it is easy to compute the distribution on guesses and thereby obtain the distribution on next states, so it's easy to compute $mathbb{E}[p(g',r')]$.

Answered by D.W. on December 26, 2021

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP