TransWikia.com

Optimal Strategy in Dice Game

Mathematics Asked on February 23, 2021

In a recent interview I received the following question (an optimisation/strategy game)…which left me a bit stumped. The rules of play, you start with 0 points, then:

Roll three fair six-sided dice;

Now you have the option:

Hold: i.e. accept the values shown on your dice as the score for your turn. There is a caveat, if two or more dice show the same values, then all of them are flipped upside down – e.g. 1 becomes 6

OR

reroll the dice: You may choose to hold any combination of the dice on the current value shown (so you can choose to keep 1 dice the same and then reroll the other two). Rerolling costs you 1 point – so during the game and perhaps even at the end your score may be negative.
You can roll an infinite number of times…

My thoughts:

So clearly the best possible score is 18 and is achieved by rolling three 1s on the first roll
The reroll penalty prevents rolling forever to get 18.
If the value of the dice is greater than the expected value of rerolling them (accounting for the penalty), then you should stick…
I guess what I am asking is how do I work out the expected value of rerolling them (accounting for the penalty) and how does this fit into the optimal strategy…

3 Answers

Comments only:

A good programmer (which I'm not) can implement Hugo Manet's idea and get the optimal strategy. But I believe there is no nice pattern for one to solve mathematically based on my calculation with only one step projection (thus it's not optimal). Still one can get a feeling as to why it's not so simple (I could be wrong). Here's a summary:

  • If you get $3$ identical numbers, stick if it's $111,222,333$; reroll all three if you have $444,555,666$;

  • If you get $3$ distinct numbers:

  1. If total points $ge 11$ then stick;

  2. If total points $le 9$ then keep the smallest, reroll the two bigger ones;

  3. If total points $=10$: keep the bigger two and reroll $1$ if it's ${1,3,6}$ or ${1,4,5}$; stick if it's ${2,3,5}$.

  • You get a pair and a third one which is different. This is very complicated.
  1. When the pair is $1, 2$ or $3$: if the third one is $4/5/6$, stick; otherwise reroll the third one. Notice that if we have ${2,2,3}$ we get a tie: either stick, or reroll $3$;

  2. When the pair is $4$: if the third one is $5/6$, stick; otherwise keep the third one but re-roll both $4$'s;

  3. When the pair is $5$: if the third one is $1/2$, keep it and reroll both $5$'s; otherwise reroll only one $5$ and keep the other two;

  4. When the pair is $6$: if the third one is $1$, keep it and reroll both $6$'s; otherwise reroll only one $6$ and keep the other two.

Again, the above is not the optimal because it only looks at one step. The optimal strategy should be decided by going backward from step 18, for example, where you know you should stick because otherwise you'd end up with a negative score.


R code below:

It was done in a rush so there's plenty of space for improvement.

pts = function(a)
{
  u = unique(a);
  if(length(u)==3) {
    p = sum(a);
  } else if(length(u)==1) {
    p = 3*(7-u);
  } else
  {
    M = sum(a)-sum(u)
    p = sum(u)-M + 2*(7-M);
  }
  p
}

onestep <- function(a)
{
  p = rep(-1,8);
  
  p[8]=pts(a);

  for(i in 1:6)
  {
    # 001
    p[1] = p[1] + pts(c(a[1], a[2], i))/6
    
    # 010
    p[2] = p[2] + pts(c(a[1], i, a[3]))/6

    # 100
    p[4] = p[4] + pts(c(i, a[2], a[3]))/6

    for(j in 1:6)
    {
        # 011
        p[3] = p[3] + pts(c(a[1], i, j))/36
    
        # 101
        p[5] = p[5] + pts(c(i, a[2], j))/36

        # 110
        p[6] = p[6] + pts(c(i, j, a[3]))/36

        for(k in 1:6)
        {
        # 111
        p[7] = p[7] + pts(c(i,j,k))/216;
        }
    }   


  }
  indx = which(p==max(p));
  c(p, p[indx], indx)
}

sink("mseoutput.txt")

for(i in 1:6)
{
    for(j in i:6)
    {
        for(k in j:6)
        {
            out <- onestep(c(i,j,k));
            cat(i,j,k,out);
            cat('n');
        }
    }
}

sink()

Answered by Neat Math on February 23, 2021

The fact that you flip dice if they show the same value doesn't actually affect any expected values. You are just as likely to roll say $(1,1)$ (which gets flipped to $(6,6)$) as you are to roll $(6,6)$ in the game with no flipping. So I will essentially completely ignore this rule.

Now suppose our strategy is "re-roll a die if it comes up at most $x$, else hold the value". Then by the law of total expectation, your expected payoff $E$ for that particular die is $$E=frac{x}{6}(E-1)+left(1-frac{x}{6}right)frac{x+7}{2}implies E=frac{42-3x-x^2}{2(6-x)}.$$ Since $6$ is a small number, at this point you may as well just plug in the values of $x$, and you will find

$x$ $E(x)$
$0$ $3.5$
$1$ $3.8$
$2$ $4$
$3$ $4$
$4$ $3.5$
$5$ $1$

So our expected value is $12$ with the best strategy. We have two equally good strategies: either "hold values of three or more", or "hold values of four or more".

Answered by jlammy on February 23, 2021

One important point is that, anytime in the game, your best action depends only on the dice on the table, and not on the number of penalties you've accumulated so far. Even if you have -20 in penalties, if your dice are bad, maybe you should reroll (and worry that the dice aren't fair after all).

Apart from that, I don't see much good properties on the problem, so to me it's a modeling problem and you have to compute it.

You start with an benefit function $f_0({x,y,z})$ which is the value of your dice combination, and you compute successively the benefit if you also add the choice to reroll at most $i$ times : $f_{i+1}({x,y,z}) = max f_i({x,y,z}), (-1 + mathbb E [f_i({X,y,z})]), dots$.

Since the $f_i$ are increasing and bounded, they converge (exponentially fast) to a $f_infty$, and you can make your choice by comparing $f_infty({x,y,z})$ and $(-1 + mathbb E [f_infty({X,y,z})])$, etc.

If you want the exact values of the expectation, you can see the transformation of $f_i$ to $f_{i+1}$ as a matrix multiplication (once you reached the point where the choice of the $max$ will be stable), so you just have to find its stable point.

Answered by Hugo Manet on February 23, 2021

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP