# Estimating number of infected people and getting bounds of its probability based on a few samples.

Mathematics Asked by user_9 on December 5, 2020

This question came on my exam a few week ago, and I’ve been stuck on it ever since.

Say a hospital has received 500 blood samples for COVID testing. We want to estimate how many of the the samples are of infected people before testing them all. Also, we can assume that are no false positives/negatives.

Let m be the total no. of infected samples out of these 500. Now, 20 samples are randomly chosen, and it is found that n of these are of infected people. These are the following questions we have to answer:

1. What is the estimate of m, given the value of n? That is, $$E[M|N=n]$$ where N denotes the random variable of infected samples from the 20 randomly chosen ones, and M denotes the number of infected samples out of 500.

2. Say, 4 out of these 20 are of those infected. What is $$P[M>E[M] + 1| N = 4]$$? This is to find out how reliable the expected value is.

Any other assumptions required can be made.

My approach:

$$P(M=m, N=n) = frac{binom{m}{n}binom{500-m}{20-n}}{binom{500}{20}}$$, so we can find $$E[M|N=n]$$ as $$sum_{m=n}^{500} mfrac{P(M=m, N=n)}{Pr(N=n)}$$, where $$Pr(N=n) = frac{1}{21}$$(I don’t think this part is correct, but since $$n ={0,1,2….20}$$, I wrote this down)

Thanks in advance, this question has really been bugging me!

Your expression $${500choose20}$$ is the number of subsets of size $$20$$ that can be drawn from your set of $$500$$. The expression $${mchoose n}{500-mchoose 20-n}$$ is, for a given subset of size $$m$$, the number of subsets of size $$20$$ which include exactly $$n$$ members of that given subset and exactly $$20-n$$ members of its complement. So, if each subset of size $$20$$ is equally likely, the fraction $$frac{{mchoose n}{500-mchoose 20-n}}{{500choose20}}$$ is the probability that your randomly chosen subset of size $$20$$ contains exactly $$n$$ members of the given subset of size $$m$$.

The problem here is that the description of the problem makes $$m$$ a non-random number whose value is already determinable, although unknown. Its exact value could be obtained by testing all $$500$$ blood samples. Non-Bayesians would consider it inappropriate to treat it as a random variable and if they had to estimate its value from a random sample of size $$20$$ they would probably use some sort of significance test.

That your exam question does treat it as a random variable implies, I presume, that you're required to adopt a Bayesian approach, which would entail your assigning a prior distribution to that random variable. For the moment, let's treat this prior, $$pi$$, say, as arbitrary: $$pi_m=P(M=m) .$$ You can then obtain the posterior distribution of $$M$$, given $$N=n$$ , from Bayes's theorem: begin{align} P(M=m,| N=n,)&=frac{P(N=n |,M=m,)P(M=m)}{P(N=n)}\ &=cases{ frac{{mchoose n}{500-mchoose 20-n}pi_m}{sum_{i=n}^{480+n} {ichoose n}{500-ichoose 20-n}pi_i}& if nle mle n+ 480\ 0&otherwise .} end{align} Your formula for $$E[M|N=n]$$ is correct, except that begin{align} P(M=m, N=n)&=frac{{mchoose n}{500-mchoose 20-n}pi_m}{{500choose20}} text{ and}\ P(N=n)&= displaystylesum_{i=n}^{480+n} frac{{ichoose n}{500-ichoose 20-n}pi_i}{{500choose20}} . end{align} The value of $$Pbig(M>E[M]+1,|,N=4big)$$ is given by begin{align} Pbig(M>E[M]+1,|,N=4big)&=sum_{m= max(lfloor E[M] rfloor +2, 4)}^{484}P(M=m,|,N=4,)\ &=frac{sum_{m= max(lfloor sum_{m=0}^{500}mpi_m rfloor +2, 4)}^{484} {mchoose 4}{500-mchoose 16}pi_m}{sum_{i=4}^{484} {ichoose 4}{500-ichoose 16}pi_i} . end{align} Coming to the vexed question of what to choose for $$pi$$, the choice of priors in Bayesian statistics is nearly always going to be somewhat subjective. Since I have no idea how this topic was treated in your course, I also have little idea how your examiners would have expected you to handle it in the exam question you've quoted. Also, while I have used statistics professionally, I certainly wouldn't claim to have ever been a professional statistician, let alone one with a good working knowledge of Bayesian statistics. Please bear that in mind while reading the following suggestion.

If your $$500$$ blood samples were taken somewhat randomly from a large population, it seems to me that a reasonable choice for $$pi$$ would be $$text{Binomial}(500,p)$$: $$P(M=m)={500choose m}p^m(1-p)^m$$ for some value of $$p$$, which would be the proportion of the large population that are infected. For COVID-$$19$$, $$p$$ will not be known exactly, but if you know the population from which the sample was drawn you may have a reasonable estimate that you could use for the value of $$p$$. Otherwise, the best you're likely to be able to do is to get expert epidemiologists to suggest a range $$[a,b]$$ in which they think $$p$$ is $$90%$$ (say) likely to lie, and choose a suitable prior distribution $$Pi$$ for $$p$$ such that $$Pibig([a,b]big)=0.9$$. Your prior distribution for $$M$$ will then be $$P(M=m)={500choose m}int_0^1p^m(1-p)^mdPi(p) ,$$ and $$E[M]=500E[p]=500displaystyleint_0^1pdPi(p)$$.

Answered by lonza leggiera on December 5, 2020

## Related Questions

### How to prove that $-|z| le Re (z) le |z|$ and $-|z| le Im (z) le |z|$?

2  Asked on November 12, 2021

### Does $g(v_n) longrightarrow g(0)$ for all $v_n text{s.t.} ||v_{n+1}|| leq ||v_n||$ imply $g$ continuos at $0$?

2  Asked on November 12, 2021 by a_student

### Your favorite way to think of $k[x_1,ldots,x_n]$ modulo some graded ideal?

0  Asked on November 12, 2021

### Why are the probability and mean number of edges between two nodes in a network equal for large networks?

1  Asked on November 12, 2021

### Geometric proof for the half angle tangent

3  Asked on November 12, 2021 by brazilian_student

### What are the relations between eigenvectors of $A$ and its adjoint $A^*$?

3  Asked on November 12, 2021 by user66906

### Skyscrapers sheaf’s global sections

2  Asked on November 12, 2021 by abramo

### Constructibility of the 17-gon

2  Asked on November 12, 2021

### Building palisade with Lego bricks

1  Asked on November 12, 2021

### Evaluating the integral $int^{infty}_{-infty} frac{dx}{x^4-2cos(2theta)x^2 +1}$

2  Asked on November 12, 2021 by user793781

### Ideals in a UFD

1  Asked on November 12, 2021

### The relationship between LCTVS and projective limit of a projective family of norm spaces.

1  Asked on November 12, 2021

### Matrix-vector multiplication/cross product problem

1  Asked on November 12, 2021 by kurt-muster

### Proving a self independent random variable can get only one value

1  Asked on November 12, 2021 by override

### Regular Expression describing language accepted by Finite State Automata

1  Asked on November 12, 2021 by awu

### Solving this DE

1  Asked on November 12, 2021

### Driven harmonic oscillator: Why does the phase of the driver have such a big impact on the solution?

1  Asked on November 12, 2021

### Bijection of a Generalised Cartesian Product

1  Asked on November 12, 2021

### Prove that $a ⊈ {a}$, where $a$ is non-empty

1  Asked on November 12, 2021 by galaxylokka

### Use linearisation of a certain function to approximate $sqrt[3]{30}$

3  Asked on November 12, 2021

Get help from others!