TransWikia.com

Calculating the variance of dice rolls?

Cross Validated Asked on December 15, 2021

I am having trouble understanding how to find the variance for the proportion of times we see a 6 when we roll a dice. The question is below:

Suppose we are interested in the proportion of times we see a 6 when
rolling n=100 dice. This is a random variable which we can simulate with

x=sample(1:6, n, replace=TRUE) 

and the proportion we are interested in can be expressed as an average:

mean(x==6)

Because the die rolls are independent, the CLT applies. We want to roll n dice 10,000 times and keep these proportions. This
random variable (proportion of 6s) has mean p=1/6 and variance p*(1-p)/n. So according to the CLT, z = (mean(x==6) - p) / sqrt(p*(1-p)/n) should be normal with mean 0 and SD 1.

So according to the problem, the mean proportion you should get is 1/6. I can get how the proportion of 6’s you get should average out to 1/6. The mean proportion is p = 1/6.

But the variance confuses me. The question says variance is p*(1-p)/n. But the formula for variance for a sample is the sum of the difference between a value and the mean divided by the sample size minus one. Why do they do differently here?

3 Answers

You are correct to say that your experiment to roll a fair die $n=100$ times can be simulated in R using:

set.seed(2020)
n = 100; x=sample(1:6, n, replace=TRUE)
sum(x);  mean(x);  var(x)
[1] 347
[1] 3.47
[1] 2.635455

For one roll of a fair die, the mean number rolled is $$mu = E(X) = sum_{i=1}^6 iP(X=i) = sum_{i=1}^6 i(1/6) = 3.5,$$

x = 1:6;  pr=rep(1/6,6)
sum(x*pr)
[1] 3.5

The variance of the result is $Var(X) = E[(X_i - mu)^2] = E(X^2) - mu^2.$

$$E(X^2) = sum_{i=1}^6 i^2P(X = i) = sum_{i=1}^6 i^2(1/6) = 91/6 = 15.16667.$$

sum(x^2*pr)
[1] 15.16667

$$Var(X) = 91/6 - (7/2)^2 = 35/12 = 2.916667.$$

sum(x^2*pr) - 3.5^2
[1] 2.916667
sum((x-3.5)^2*pr)
[1] 2.916667

Then, for 100 rolls of the die, the total is $T = sum_{j=1}^{100} X_j$ with $$E(T) = E(X_1 + X_2 +cdots + X_{100}) = 100(3.5) = 350.$$ and (by independence) $$Var(T) = Var(X_1 + X_2 + cdots X_{100}) = 100(35/12) = 291.6667.$$ So we have $E(A) = E(bar X) = E(T/100) = E(T)/100 = 3.50.$ and $Var(A) = Var(bar X) = Var(T/100) = frac{1}{100^2}Var(T) = 0.02916667.$ Also, $Var(A) = Var(bar X) = Var(X_j)/100 = 2.916667/100 = Var(T)/100^2 = 0.02916667.$

If we simulate a million 100-toss experiments, we can get a close approximation of these theoretical results

set.seed(723)
m - 10^6;  n = 100
t = replicate(m, sum(sample(1:6, n, rep=T)))
mean(t)
[1] 349.995       # aprx E(T) = 350
var(t)
[1] 291.7679      # aprx Var(T) = 291.67
a = t/n
mean(a)
[1] 3.49995       # aprx E(A) = 3.5
var(a)
[1] 0.02917679    # aprx Var(A) = 0.029

Answered by BruceET on December 15, 2021

But the variance confuses me. The question says variance is p*(1-p)/n. But the formula for variance for a sample is the sum of the difference between a value and the mean divided by the sample size minus one. Why do they do differently here?

That is the sample variance, i.e. $$hatsigma^2=frac{1}{n-1}sum_{i=1}^n (x_i-bar x)^2$$

For a random sample of $x_i$.

Answered by gunes on December 15, 2021

Let's call $x$ the number of 6's in $n$ die rolls. The theoretical variance for the number of 6's in $N$ die rolls is then $var(x|N=n)=np(1-p)$.

Now let's call $pi$ the proportion of die rolls which are 6's. Then $E(pi|N=n)=frac{x}{n}$. The variance for the proportion of 6's is $var(pi|N=n)=var(frac{x}{n}|N=n)=frac{1}{n^2}var(x|N=n)=frac{p(1-p)}{n}$.

That is fine for theoretical values; however, now let's say you want to gather some data (or simulate) and estimate $var(frac{x}{n}|N=n)$ from your data. In that case, you need to account for also estimating the mean. While you could assume the mean is 1/6, perhaps this die is biased and so $P(6)neq 1/6$.

Since you have to estimate the mean, you effectively use up one of your data points: if you gave me $n-1$ observations and the mean, I know the $n$-th observation. (Thus that $n$-th observation is not independent after using the estimated mean.) We say that the degrees of freedom is $n-1$. For this reason, when you estimate your sample variance you divide the sum of squared differences from the mean by $n-1$.

Answered by kurtosis on December 15, 2021

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP