# Computing the variance of hypergeometric distribution using indicator functions

Mathematics Asked by xxx on October 5, 2020

I want to compute the variance of a random variable $X$ which has hypergeometric distribution $mathrm{Hyp}(n,r,b)$, where $n$ is the total number of balls in the urn and $r$ and $b$ are the numbers of red/black balls, by using the representation

$$X= I_{A_1} + cdots + I_{A_n}$$

($I_A$ is the indicator function of $A$ and $A_i$ means that we have a red ball in the $i$-th draw).

So for the expected value we have

$$E[X] = E[I_{A_1} + cdots + I_{A_n}] = E[I_{A_1}] + cdots +E[I_{A_n}] = P(A_1) + cdots + P(A_n)$$

But I don’t know how to calculate these $P(A_i)$. And what about $E[X^2]$? Can anybody help?

$newcommand{var}{operatorname{var}}newcommand{cov}{operatorname{cov}}$

The variance of $I_{A_1}+cdots+I_{A_n}$ is trivially $0$ since the sum is $r$ with probability $1$.

But suppose there had been more than $n$ balls in the urn, so that it would not be certain that every red ball had been drawn after $n$ trials. Then we would have begin{align} var(I_{A_1}+cdots+I_{A_n}) & = var(I_{A_1})+cdots+var(I_{A_n}) + underbrace{2cov(I_{A_1},I_{A_2})+cdotsquad{}}_{n(n+1)/2text{ terms}} \[10pt] & = nvar(I_{A_1}) + frac{n(n+1)}2 cov(I_{A_1},I_{A_2}). end{align}

Next we have $$var(I_{A_1}) = operatorname{E}(I_{A_1}^2)-(operatorname{E}I_{A_1})^2$$ and then use the fact that $I_{A_1}^2=I_{A_1}$ since $0^2=0$ and $1^2=1$.

For the covariance, you have $$cov(I_{A_1},I_{A_2}) = operatorname{E}(I_{A_1}I_{A_2}) - (operatorname{E}I_{A_1})(operatorname{E}I_{A_2})$$ And $operatorname{E}(I_{A_1}I_{A_2})=Pr(I_{A_1}=I_{A_2}=1)=dfrac{binom r 2}{binom{r+b}2}$.

Correct answer by Michael Hardy on October 5, 2020

$$newcommand{var}{operatorname{var}}newcommand{cov}{operatorname{cov}}$$Just a small note to Michael's answer. The number of $$2 cov(I_{A_{1}}, I_{A_{2}})$$ terms is $$nchoose 2$$. Thus, the variance becomes:

begin{align} var(I_{A_1}+cdots+I_{A_n}) & = var(I_{A_1})+cdots+var(I_{A_n}) + underbrace{2cov(I_{A_1},I_{A_2})+cdotsquad{}}_{{nchoose 2}text{ terms}} \[10pt] & = nvar(I_{A_1}) + {nchoose 2} 2 cov(I_{A_1},I_{A_2}). end{align}

(I wrote it as a separate answer, because it was rejected as an edit, and don't have enough reputation to comment.)

Answered by abblaa on October 5, 2020

Outline: I will change notation, to have fewer subscripts. Let $Y_i=1$ if the $i$-th ball is red, and let $Y_i=0$ otherwise.

We are picking $n$ balls. I will assume that (unlike in the problem as stated) $n$ is not necessarily the total number of balls, since that would make the problem trivial.

Then $E(X)=E(Y_1)+cdots+E(Y_n)$. Note that $Pr(Y_i=1)=frac{r}{r+b}$. For if the balls have ID numbers (if you like, in invisible ink) then all sequences of balls are equally likely.

For the variance, as you know, it is enough to compute $E(X^2)$. Expand $(Y_1+cdots+Y_n)^2$ and take the expectation, using the linearity of expectation.

We have terms $Y_i^2$ whose expectation is easy, since $Y_i^2=Y_i$. So we need the expectations of the "mixed" products $Y_iY_j$. We need to find the probability that the $i$-th ball and the $j$-th ball are red. This is the probability that the $i$-th is red times the probability that the $j$-th is red given that the $i$-th is.

Thus $E(Y_iY_j)=frac{r}{r+b}cdotfrac{r-1}{r+b-1}$.

Now it s a matter of putting the pieces together.

Answered by André Nicolas on October 5, 2020

## Related Questions

### Is it true that if $P(int_0^T f^2(s) ds<infty)=1$ then the exponential defines a density?

1  Asked on November 29, 2021 by user658409

### Random walk returning probability

3  Asked on November 29, 2021

### Prove that $mathrm{ht}(P/Ra)=mathrm{ht}(P) -1$

1  Asked on November 29, 2021

### Bayesian statistics notation: “$P(text{event}|x)$” vs “$P(text{event}|theta, x)$”

1  Asked on November 29, 2021

### Finding the third side of a triangle given the area

3  Asked on November 29, 2021

### Analytic continuation of $Phi(s)=sum_{n ge 1} e^{-n^s}$

1  Asked on November 29, 2021 by geocalc33

### If the monoid algebra $R[M]$ is finitely generated, then $M$ is a finitely generated monoid.

2  Asked on November 29, 2021 by dylan-c-beck

### Second cohomology group of an affine Lie algebra

0  Asked on November 29, 2021 by b-pasternak

### A problem on spectrum of a self-adjoint operator

1  Asked on November 29, 2021 by surajit

### Let $Lin End(V)$ with $L(V)=W$. Then $Tr(L)=Tr(L|_W)$

1  Asked on November 29, 2021

1  Asked on November 29, 2021

### Let $lambda$ be a real eigenvalue of matrix $AB$. Prove that $|lambda| > 1$.

1  Asked on November 29, 2021

### Integral of a product of Bessel functions of the first kind

1  Asked on November 29, 2021 by user740332

### Connectives in George Tourlakis’ Mathematical Logic

1  Asked on November 29, 2021 by darvid

### Let $T=int_{0}^{x}f(y)dt$. Find eigenvalues and range of $T+T^*$

0  Asked on November 29, 2021

### Find the smallest eigenvalue of $G=[ exp(-(x_i-x_j )^2]_{i,j}$ for ${bf x}=[x_1,dots,x_n]$

2  Asked on November 29, 2021

### Proof of convergence of $sum_{n=1}^{infty}frac{(-1)^{lfloor nsqrt{2}rfloor}}{n}$

1  Asked on November 29, 2021 by kubus

### Inverting product of non-square matrices?

1  Asked on November 29, 2021

### Half-SAT/ Half-Satisfiability

1  Asked on November 29, 2021 by f-u-a-s