TransWikia.com

Detailed analysis of the secretary problem

Mathematics Asked by saulspatz on November 2, 2021

The secretary problem is well-known. $N>2$ candidates present themselves for a job. You must either hire a candidate immediately after interviewing him, or let him go. The strategy you follow is to reject the first $R$ candidates, and then hire the first candidate who is better than all others you have interviewed so far. The problem is to find the value of $R$ that maximizes the probability of hiring the best candidate, and the usual answer given is to reject $frac Ne$ candidates.

In fact, one can show that the optimum number of candidates to reject is the greatest integer $R$ such that $$sum_{k=R}^{N-1}frac1k>1,$$ and that this number is always one of the two integers closest to $frac Ne$. I’ll defer the proofs of these assertions in order to get right to my question.

I guessed that the optimum value of $R$, which I’ll call $R_N$ would be equal to $leftlfloorfrac Nerightrfloor$ about half the time, but it seems that it’s actually best about two-thirds of the time. The graph below plots $$y_K=frac1{K-2}sum_{N=3}^Kleft[R_N=leftlfloorfrac Nerightrfloorright], 100leq Kleq1000$$ where $[cdot]$ is the Iverson bracket: $$[P]=cases{1,& if $P$ is true\0,& if $P$ is false}$$
enter image description here
While there’s a lot of oscillation, it does seem as though $lim_{Ktoinfty}y_K$ may well exist, and it certainly seems that $liminf_{Ktoinfty}y_K>frac12$.

I would like to investigate this, but I have no idea how to approach it. It seems like numerical analysis of the harmonic numbers must enter into it, but how to pick out the fraction of times that the floor is better than the ceiling?

Can you give me any suggestions of what kind of techniques might apply and where to learn about them? I put the analytic number theory tag on because I couldn’t think of anything better.

Here are the easy proof of the facts stated earlier.

Suppose the best candidate is the $k$th interviewed. Clearly, there is no chance of success unless $k>R$. Then we succeed if and only if the best candidate among the first $k-1$ interviewed was one of the first $R$. The probability of success is $$w_R := frac1Nsum_{k=R+1}^Nfrac R{k-1}=frac RNsum_{k=R}^{N-1}frac1{k}$$

Now $$begin{align}
w_{R+1}-w_R&=frac {R+1}Nsum_{k=R+1}^{N-1}frac1{k}-frac RNsum_{k=R}^{N-1}frac1{k}\
&=left(frac{R+1}N-frac RNright)sum_{k=R+1}^{N-1}frac1k-frac R{NR}\
&=frac1Nleft(sum_{R+1}^{N-1}frac1k-1right)
end{align}$$

so that $w_{R+1}>w_R$ if and only if $sum_{R+1}^{N-1}frac1k>1$. That is, the optimum number is of candidates to reject is the greatest integer $R$ such that $$sum_R^{N-1}frac1k>1.$$

(It is well-known that there are no positive integers $n>m$ such that $sum_{k=m}^nfrac1k=1,$ so there is no ambiguity.)

Given $N$, let $R$ be determined as above. I claim that $R$ is one of the two integers closest to $frac Ne$. We have $$1<sum_{k=R}^{N-1}frac1k<sum_{k=R}^{N-1}int_{k-1}^kfrac{mathrm{d}x}{x}=int_{R-1}^{N-1}frac{mathrm{d}x}{x}=ln{frac{N-1}{R-1}}$$ so that $$R<1+frac{N-1}{e}<1+frac Ne.$$

Also, we have $$
1>sum_{k=R+1}^{N-1}frac1k>sum_{k=R+1}^{N-1}int_k^{k+1}frac{mathrm{d}x}x=int_{R+1}^Nfrac{mathrm{d}x}x= ln{frac N{R+1}}$$
so that
$$R>frac Ne -1.$$

One Answer

Since the fractional part of $N/e$, calculated over the integers $N$, is uniformly distributed $bmod 1$ (this can be checked by the Weyl's criterion), the result that $$frac{N}{e}-1 <R<frac{N}{e}+1$$ might suggest that $R=lfloor N/erfloor$ and $R=lceil N/erceil$ occur with equal probability.

However, as shown in the calculations of the OP, the correct result for the range of $R$ is

$$frac{N}{e}-1 <R<frac{N}{e}+1-frac{1}{e}$$

The central value of this interval is $N/e-1/(2e)$, so the expected value of its fractional part is not $1/2$, but $1/2-1/(2e)$.

Assuming that the probabilities that $R=lfloor N/erfloor$ and $R=lceil N/erceil$ are linearly related to the closeness of this central value with $lfloor N/erfloor$ and $lceil N/erceil$, we get that the expected proportion of cases where $R=lfloor N/erfloor$ is

$$1-left(frac{1}{2}-frac{1}{2e}right) approx 0.6839$$

in accordance with the observed experimental value shown in the plot of the OP.


After some constructive comments on the possibility to improve the bounds for $R$, the idea to calculate the integral between $k-1/2$ and $k+1/2$ gives interesting results and confirms the value above.

Since we have $$1<sum_{k=R}^{N-1}frac1k<sum_{k=R}^{N-1}int_{k-1/2}^{k+1/2}frac{mathrm{d}x}{x}=int_{R-1/2}^{N-1/2}frac{mathrm{d}x}{x}=ln{frac{N-1/2}{R-1/2}}$$ then $$R<frac Ne +frac 12-frac 1{2e}$$

Also, since as $krightarrow infty$ we have $$ 1>sum_{k=R+1}^{N-1}frac1k>sum_{k=R+1}^{N-1}int_{k-1/2+epsilon}^{k+1/2+epsilon}frac{mathrm{d}x}x=int_{R+1/2+epsilon}^{N-1/2+epsilon}frac{mathrm{d}x}x= ln{frac {N-1/2+epsilon}{R+1/2+epsilon}}$$ we get the asymptotic lower bound $$R>frac Ne -frac 12-frac 1{2e}$$

Collecting both results we finally have the optimized bounds

$$frac Ne -frac 12-frac 1{2e}<R<frac Ne +frac 12-frac 1{2e}$$

which numerically corresponds to the approximate interval

$$frac Ne -0.6839<R<frac Ne +0.3161 $$

Note that the interval identified by the bounds asymptotically converges to $1$. Therefore, for any value of $N/e$, only one among $lfloor N/e rfloor$ and $lceil N/e rceil$ is a possible value of $R$, being included in the range. Reminding the equidistribution $bmod 1$ of $N/e$, we get that the probability that the floor function is selected is $$1/2+1/(2e)approx 0.6839$$

Lastly, it should be noted that these bounds represent the best possible ones (otherwise there would be some $N$ for which no $R$ is allowed).

Answered by Anatoly on November 2, 2021

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP