In classical state discrimination, why does the trace distance quantify the probability of success?

Question

Consider the following task: we are given a probability distribution $p_y:xmapsto p_y(x)$ with $yin{0,1}$ (e.g. we are given some black box that we can use to draw samples from either $p_0$ or $p_1$), and we want to recover the value of $y$.
Assume $xin X$ for some finite register $X$.
We can assume we know what the possible distributions $p_y$ are, and we also know the probability $lambda_y$ with which each $y$ can occur: $mathrm{Pr}(Y=y)=lambda_y$.
Now, suppose we got our box, used it, and obtained the result $x$. Knowing $lambda,p_0,p_1$, what's our best guess for $y$? Well, we know that $mathrm{Pr}(X=x|Y=y)=p_y(x)$, and thus from Bayes that
$$mathrm{Pr}(Y=y|X=x) = frac{lambda_y p_y(x)}{p(x)}, quad text{where}quad p(x)equiv sum_y lambda_y p_y(x).$$
It then stands to reason that, knowing $x$, our best guess for $y$ is the one that maximises this probability:
$$y_{rm opt}(x)=mathrm{argmax}_ymathrm{Pr}(Y=y|X=x)=mathrm{argmax}_y lambda_y p_y(x).$$
Using such guess, the probability of guessing right (using the knowledge of $x$) would then be
$$p_{rm correct}(x) = max_y mathrm{Pr}(Y=y|X=x) = frac{max_y lambda_y p_y(x)}{p(x)}.$$
We would now like to know what is the probability of correctly identifying $y$ without having sampled from $p_y$. Naively, I would compute this as the average of $p_{rm correct}(x)$ weighted over the probabilities $p(x)$, which would give
$$p_{rm correct} = sum_x p(x)p_{rm correct}(x) = sum_x max_y lambda_y p_y(x).tag A$$
However, e.g. in Watrous' book (around Eq. (3.2), pag 126, though I'm using a slightly different notation here), the author mentions that the probability of correctly identifying $y$ oughts to be calculated by considering the probability that the guess is right minus the probability that it is not, and that this would result in the quantity
$$sum_{xin X}leftlvert lambda_0 p_0(x)- lambda_1 p_1(x)rightrvert
= |lambda_0 p_0 - lambda_1 p_1|_1,tag B$$
from which we would recover the probability of being correct as $p_{rm correct}=frac12(1+|lambda_0 p_0-lambda_1 p_1|_1)$.
Assuming the reasoning that led me to (A) is correct, why is my expression for $p_{rm correct}$ compatible with the one resulting from (B)?

glS · Answer

Upon some additional reflection, I found a way to see that the expressions are equivalent. I'm not sure whether this is the way that was intended in the text though, there might be better/more direct ways to get to the expression with the trace norm.
The basic idea is to observe that for any $a,binmathbb R$ we have
$$(a+b)+|a-b| = 2max(a,b).$$
Using this result, we can write
$$2max_y lambda_y p_y(x) = sum_y lambda_y p_y(x) + lvertlambda_0 p_0(x)-lambda_1 p_1(x)rvert.$$
Summing over $x$ we thus get
$$2p_{rm correct} = 2sum_x max_y lambda_y p_y(x)
= underbrace{sum_x sum_y lambda_y p_y(x)}_{=1} + sum_x lvertlambda_0 p_0(x)-lambda_1 p_1(x)rvert,$$
and therefore
$$p_{rm correct} = frac12left(1 + |lambda_0 p_0 - lambda_1 p_1|_1right).$$

In classical state discrimination, why does the trace distance quantify the probability of success?

One Answer

Add your own answers!

Ask a Question