Computing the variance of hypergeometric distribution using indicator functions

Mathematics Asked by xxx on October 5, 2020

I want to compute the variance of a random variable $X$ which has hypergeometric distribution $mathrm{Hyp}(n,r,b)$, where $n$ is the total number of balls in the urn and $r$ and $b$ are the numbers of red/black balls, by using the representation

$$X= I_{A_1} + cdots + I_{A_n}$$

($I_A$ is the indicator function of $A$ and $A_i$ means that we have a red ball in the $i$-th draw).

So for the expected value we have

$$E[X] = E[I_{A_1} + cdots + I_{A_n}] = E[I_{A_1}] + cdots +E[I_{A_n}] = P(A_1) + cdots + P(A_n)$$

But I don’t know how to calculate these $P(A_i)$. And what about $E[X^2]$? Can anybody help?

Thanks in advance!

3 Answers


The variance of $I_{A_1}+cdots+I_{A_n}$ is trivially $0$ since the sum is $r$ with probability $1$.

But suppose there had been more than $n$ balls in the urn, so that it would not be certain that every red ball had been drawn after $n$ trials. Then we would have begin{align} var(I_{A_1}+cdots+I_{A_n}) & = var(I_{A_1})+cdots+var(I_{A_n}) + underbrace{2cov(I_{A_1},I_{A_2})+cdotsquad{}}_{n(n+1)/2text{ terms}} \[10pt] & = nvar(I_{A_1}) + frac{n(n+1)}2 cov(I_{A_1},I_{A_2}). end{align}

Next we have $$ var(I_{A_1}) = operatorname{E}(I_{A_1}^2)-(operatorname{E}I_{A_1})^2 $$ and then use the fact that $I_{A_1}^2=I_{A_1}$ since $0^2=0$ and $1^2=1$.

For the covariance, you have $$ cov(I_{A_1},I_{A_2}) = operatorname{E}(I_{A_1}I_{A_2}) - (operatorname{E}I_{A_1})(operatorname{E}I_{A_2}) $$ And $operatorname{E}(I_{A_1}I_{A_2})=Pr(I_{A_1}=I_{A_2}=1)=dfrac{binom r 2}{binom{r+b}2}$.

Correct answer by Michael Hardy on October 5, 2020

$newcommand{var}{operatorname{var}}newcommand{cov}{operatorname{cov}}$Just a small note to Michael's answer. The number of $2 cov(I_{A_{1}}, I_{A_{2}})$ terms is $nchoose 2$. Thus, the variance becomes:

begin{align} var(I_{A_1}+cdots+I_{A_n}) & = var(I_{A_1})+cdots+var(I_{A_n}) + underbrace{2cov(I_{A_1},I_{A_2})+cdotsquad{}}_{{nchoose 2}text{ terms}} \[10pt] & = nvar(I_{A_1}) + {nchoose 2} 2 cov(I_{A_1},I_{A_2}). end{align}

(I wrote it as a separate answer, because it was rejected as an edit, and don't have enough reputation to comment.)

Answered by abblaa on October 5, 2020

Outline: I will change notation, to have fewer subscripts. Let $Y_i=1$ if the $i$-th ball is red, and let $Y_i=0$ otherwise.

We are picking $n$ balls. I will assume that (unlike in the problem as stated) $n$ is not necessarily the total number of balls, since that would make the problem trivial.

Then $E(X)=E(Y_1)+cdots+E(Y_n)$. Note that $Pr(Y_i=1)=frac{r}{r+b}$. For if the balls have ID numbers (if you like, in invisible ink) then all sequences of balls are equally likely.

For the variance, as you know, it is enough to compute $E(X^2)$. Expand $(Y_1+cdots+Y_n)^2$ and take the expectation, using the linearity of expectation.

We have terms $Y_i^2$ whose expectation is easy, since $Y_i^2=Y_i$. So we need the expectations of the "mixed" products $Y_iY_j$. We need to find the probability that the $i$-th ball and the $j$-th ball are red. This is the probability that the $i$-th is red times the probability that the $j$-th is red given that the $i$-th is.

Thus $E(Y_iY_j)=frac{r}{r+b}cdotfrac{r-1}{r+b-1}$.

Now it s a matter of putting the pieces together.

Answered by André Nicolas on October 5, 2020

Add your own answers!

Related Questions

Ask a Question

Get help from others!

© 2022 All rights reserved. Sites we Love: PCI Database, MenuIva, UKBizDB, Menu Kuliner, Sharing RPP