# Computing the variance of hypergeometric distribution using indicator functions

Mathematics Asked by xxx on October 5, 2020

I want to compute the variance of a random variable $X$ which has hypergeometric distribution $mathrm{Hyp}(n,r,b)$, where $n$ is the total number of balls in the urn and $r$ and $b$ are the numbers of red/black balls, by using the representation

$$X= I_{A_1} + cdots + I_{A_n}$$

($I_A$ is the indicator function of $A$ and $A_i$ means that we have a red ball in the $i$-th draw).

So for the expected value we have

$$E[X] = E[I_{A_1} + cdots + I_{A_n}] = E[I_{A_1}] + cdots +E[I_{A_n}] = P(A_1) + cdots + P(A_n)$$

But I don’t know how to calculate these $P(A_i)$. And what about $E[X^2]$? Can anybody help?

$newcommand{var}{operatorname{var}}newcommand{cov}{operatorname{cov}}$

The variance of $I_{A_1}+cdots+I_{A_n}$ is trivially $0$ since the sum is $r$ with probability $1$.

But suppose there had been more than $n$ balls in the urn, so that it would not be certain that every red ball had been drawn after $n$ trials. Then we would have begin{align} var(I_{A_1}+cdots+I_{A_n}) & = var(I_{A_1})+cdots+var(I_{A_n}) + underbrace{2cov(I_{A_1},I_{A_2})+cdotsquad{}}_{n(n+1)/2text{ terms}} \[10pt] & = nvar(I_{A_1}) + frac{n(n+1)}2 cov(I_{A_1},I_{A_2}). end{align}

Next we have $$var(I_{A_1}) = operatorname{E}(I_{A_1}^2)-(operatorname{E}I_{A_1})^2$$ and then use the fact that $I_{A_1}^2=I_{A_1}$ since $0^2=0$ and $1^2=1$.

For the covariance, you have $$cov(I_{A_1},I_{A_2}) = operatorname{E}(I_{A_1}I_{A_2}) - (operatorname{E}I_{A_1})(operatorname{E}I_{A_2})$$ And $operatorname{E}(I_{A_1}I_{A_2})=Pr(I_{A_1}=I_{A_2}=1)=dfrac{binom r 2}{binom{r+b}2}$.

Correct answer by Michael Hardy on October 5, 2020

$$newcommand{var}{operatorname{var}}newcommand{cov}{operatorname{cov}}$$Just a small note to Michael's answer. The number of $$2 cov(I_{A_{1}}, I_{A_{2}})$$ terms is $$nchoose 2$$. Thus, the variance becomes:

begin{align} var(I_{A_1}+cdots+I_{A_n}) & = var(I_{A_1})+cdots+var(I_{A_n}) + underbrace{2cov(I_{A_1},I_{A_2})+cdotsquad{}}_{{nchoose 2}text{ terms}} \[10pt] & = nvar(I_{A_1}) + {nchoose 2} 2 cov(I_{A_1},I_{A_2}). end{align}

(I wrote it as a separate answer, because it was rejected as an edit, and don't have enough reputation to comment.)

Answered by abblaa on October 5, 2020

Outline: I will change notation, to have fewer subscripts. Let $Y_i=1$ if the $i$-th ball is red, and let $Y_i=0$ otherwise.

We are picking $n$ balls. I will assume that (unlike in the problem as stated) $n$ is not necessarily the total number of balls, since that would make the problem trivial.

Then $E(X)=E(Y_1)+cdots+E(Y_n)$. Note that $Pr(Y_i=1)=frac{r}{r+b}$. For if the balls have ID numbers (if you like, in invisible ink) then all sequences of balls are equally likely.

For the variance, as you know, it is enough to compute $E(X^2)$. Expand $(Y_1+cdots+Y_n)^2$ and take the expectation, using the linearity of expectation.

We have terms $Y_i^2$ whose expectation is easy, since $Y_i^2=Y_i$. So we need the expectations of the "mixed" products $Y_iY_j$. We need to find the probability that the $i$-th ball and the $j$-th ball are red. This is the probability that the $i$-th is red times the probability that the $j$-th is red given that the $i$-th is.

Thus $E(Y_iY_j)=frac{r}{r+b}cdotfrac{r-1}{r+b-1}$.

Now it s a matter of putting the pieces together.

Answered by André Nicolas on October 5, 2020

## Related Questions

### When is a probability density function square-integrable?

0  Asked on November 2, 2021 by luiz-max-carvalho

### Probabilistic Recurrence Relations

0  Asked on November 2, 2021

### Nice inequality with exponents $a^{2b}+b^{2a}leq a^{Big(frac{a(1-a)(frac{1}{2}-a)}{4}Big)^2}$

1  Asked on November 2, 2021 by erik-satie

### Fréchet manifold structure on space of sections

0  Asked on November 2, 2021

### A question on proving an equation to be an $n$-linear system in linear algebra

2  Asked on November 2, 2021

### The Essence of Generation Functions and Coefficient Extraction

1  Asked on November 2, 2021 by user10478

### Is the inequality of the random matrices correct?

2  Asked on November 2, 2021 by rockafellar

### Convergence of Euler product implies convergence of Dirichlet series?

0  Asked on November 2, 2021

### Find a specific countable atlas for a smooth submanifold with boundary

1  Asked on November 2, 2021

### First prolongation formula

0  Asked on November 2, 2021 by geo-sd

### Reference request for conjecture about bridge and crossing number of knots

1  Asked on November 2, 2021 by r-suwalski

### Weak convergence of Radon-Nikodým derivatives

0  Asked on November 2, 2021 by mushu-nrek

### How can I prove that these definitions of curl are equivalent?

2  Asked on November 2, 2021

### Find if a random variable is a proxy variable using neural networks

0  Asked on November 2, 2021

### Bounds of $1^n + 2^{n-1} + 3^{n-2} + cdots + n^1$

3  Asked on November 2, 2021

### Optimizing a quadratic in one variable with parameterized coefficients

6  Asked on November 2, 2021

### Can the step-functions be chosen monotonically?

4  Asked on November 2, 2021

### Measurability of a function on sub sigma algebra

1  Asked on November 2, 2021 by user531706

### Stochastic Gradient Descent Converges Not to a Maximum Point

0  Asked on November 2, 2021