# Estimating number of infected people and getting bounds of its probability based on a few samples.

Mathematics Asked by user_9 on December 5, 2020

This question came on my exam a few week ago, and I’ve been stuck on it ever since.

Say a hospital has received 500 blood samples for COVID testing. We want to estimate how many of the the samples are of infected people before testing them all. Also, we can assume that are no false positives/negatives.

Let m be the total no. of infected samples out of these 500. Now, 20 samples are randomly chosen, and it is found that n of these are of infected people. These are the following questions we have to answer:

1. What is the estimate of m, given the value of n? That is, $$E[M|N=n]$$ where N denotes the random variable of infected samples from the 20 randomly chosen ones, and M denotes the number of infected samples out of 500.

2. Say, 4 out of these 20 are of those infected. What is $$P[M>E[M] + 1| N = 4]$$? This is to find out how reliable the expected value is.

Any other assumptions required can be made.

My approach:

$$P(M=m, N=n) = frac{binom{m}{n}binom{500-m}{20-n}}{binom{500}{20}}$$, so we can find $$E[M|N=n]$$ as $$sum_{m=n}^{500} mfrac{P(M=m, N=n)}{Pr(N=n)}$$, where $$Pr(N=n) = frac{1}{21}$$(I don’t think this part is correct, but since $$n ={0,1,2….20}$$, I wrote this down)

Thanks in advance, this question has really been bugging me!

Your expression $${500choose20}$$ is the number of subsets of size $$20$$ that can be drawn from your set of $$500$$. The expression $${mchoose n}{500-mchoose 20-n}$$ is, for a given subset of size $$m$$, the number of subsets of size $$20$$ which include exactly $$n$$ members of that given subset and exactly $$20-n$$ members of its complement. So, if each subset of size $$20$$ is equally likely, the fraction $$frac{{mchoose n}{500-mchoose 20-n}}{{500choose20}}$$ is the probability that your randomly chosen subset of size $$20$$ contains exactly $$n$$ members of the given subset of size $$m$$.

The problem here is that the description of the problem makes $$m$$ a non-random number whose value is already determinable, although unknown. Its exact value could be obtained by testing all $$500$$ blood samples. Non-Bayesians would consider it inappropriate to treat it as a random variable and if they had to estimate its value from a random sample of size $$20$$ they would probably use some sort of significance test.

That your exam question does treat it as a random variable implies, I presume, that you're required to adopt a Bayesian approach, which would entail your assigning a prior distribution to that random variable. For the moment, let's treat this prior, $$pi$$, say, as arbitrary: $$pi_m=P(M=m) .$$ You can then obtain the posterior distribution of $$M$$, given $$N=n$$ , from Bayes's theorem: begin{align} P(M=m,| N=n,)&=frac{P(N=n |,M=m,)P(M=m)}{P(N=n)}\ &=cases{ frac{{mchoose n}{500-mchoose 20-n}pi_m}{sum_{i=n}^{480+n} {ichoose n}{500-ichoose 20-n}pi_i}& if nle mle n+ 480\ 0&otherwise .} end{align} Your formula for $$E[M|N=n]$$ is correct, except that begin{align} P(M=m, N=n)&=frac{{mchoose n}{500-mchoose 20-n}pi_m}{{500choose20}} text{ and}\ P(N=n)&= displaystylesum_{i=n}^{480+n} frac{{ichoose n}{500-ichoose 20-n}pi_i}{{500choose20}} . end{align} The value of $$Pbig(M>E[M]+1,|,N=4big)$$ is given by begin{align} Pbig(M>E[M]+1,|,N=4big)&=sum_{m= max(lfloor E[M] rfloor +2, 4)}^{484}P(M=m,|,N=4,)\ &=frac{sum_{m= max(lfloor sum_{m=0}^{500}mpi_m rfloor +2, 4)}^{484} {mchoose 4}{500-mchoose 16}pi_m}{sum_{i=4}^{484} {ichoose 4}{500-ichoose 16}pi_i} . end{align} Coming to the vexed question of what to choose for $$pi$$, the choice of priors in Bayesian statistics is nearly always going to be somewhat subjective. Since I have no idea how this topic was treated in your course, I also have little idea how your examiners would have expected you to handle it in the exam question you've quoted. Also, while I have used statistics professionally, I certainly wouldn't claim to have ever been a professional statistician, let alone one with a good working knowledge of Bayesian statistics. Please bear that in mind while reading the following suggestion.

If your $$500$$ blood samples were taken somewhat randomly from a large population, it seems to me that a reasonable choice for $$pi$$ would be $$text{Binomial}(500,p)$$: $$P(M=m)={500choose m}p^m(1-p)^m$$ for some value of $$p$$, which would be the proportion of the large population that are infected. For COVID-$$19$$, $$p$$ will not be known exactly, but if you know the population from which the sample was drawn you may have a reasonable estimate that you could use for the value of $$p$$. Otherwise, the best you're likely to be able to do is to get expert epidemiologists to suggest a range $$[a,b]$$ in which they think $$p$$ is $$90%$$ (say) likely to lie, and choose a suitable prior distribution $$Pi$$ for $$p$$ such that $$Pibig([a,b]big)=0.9$$. Your prior distribution for $$M$$ will then be $$P(M=m)={500choose m}int_0^1p^m(1-p)^mdPi(p) ,$$ and $$E[M]=500E[p]=500displaystyleint_0^1pdPi(p)$$.

Answered by lonza leggiera on December 5, 2020

## Related Questions

### Minimum of $n$ geometric random variables

3  Asked on December 15, 2021

### Fundamental group of Klein Bottle

2  Asked on December 15, 2021 by jos-luis-camarillo-nava

### Derivative of $h(x,t)=gleft(frac{x}{t^2}right)$

3  Asked on December 15, 2021 by charith

### Lottery – Probability Discrepancy

2  Asked on December 15, 2021 by cannon444

### prove the spectral theorem for commutative operators with guidance

0  Asked on December 15, 2021

### Show numeric unstability of Cramer’s rule

1  Asked on December 15, 2021

### Need an upper bound for a simple expectation involving Rademacher random variables.

1  Asked on December 15, 2021 by golabi

### Question concerning prime ideals of $mathbb{C}[x,y]$

2  Asked on December 15, 2021

### Why is the solution to a non-homogenous linear ODE written in terms of a general fundamental solution and not a matrix exponential?

1  Asked on December 15, 2021

### $F=p_1times p_2$ a hyperbolic space?

0  Asked on December 15, 2021

### How to prove that $(a^m)^n=a^{mn}$ where $a,m,n$ are real numbers and a>0?

3  Asked on December 15, 2021 by orlin-aurum

### Find locus of point

0  Asked on December 15, 2021

### Nonlinear dynamics and state space trajectories of networks with time-dependent architecture

0  Asked on December 15, 2021 by neuroguy

### I don’t understand Gödel’s incompleteness theorem anymore

5  Asked on December 15, 2021

### If $T(p(t)) = p(t+1)$ then find its minimal polynomial where $T$ is a linear operator from $Bbb{P_n} rightarrow Bbb{P_n}$

2  Asked on December 13, 2021

### about the Laguerre square expansion Sin(x)

0  Asked on December 13, 2021 by charlessilva

### If $p$ and $q$ are coprime positive integers s.t. $frac{p}{q}=sum_{k=0}^{100}frac1{3^{2^k}+1}$, what is the smallest prime factor of $p$?

1  Asked on December 13, 2021

### Prove that a Tower of Height $H$ can be built if $H*(H+1)/2 = R + G$

2  Asked on December 13, 2021 by het

### How can I find the solution around $r=1$ for this ODE?

0  Asked on December 13, 2021

### Can I square both sides of inequality for these functions?

1  Asked on December 13, 2021 by user807688

### Ask a Question

Get help from others!