TransWikia.com

In classical state discrimination, why does the trace distance quantify the probability of success?

Quantum Computing Asked on June 24, 2021

Consider the following task: we are given a probability distribution $p_y:xmapsto p_y(x)$ with $yin{0,1}$ (e.g. we are given some black box that we can use to draw samples from either $p_0$ or $p_1$), and we want to recover the value of $y$.
Assume $xin X$ for some finite register $X$.
We can assume we know what the possible distributions $p_y$ are, and we also know the probability $lambda_y$ with which each $y$ can occur: $mathrm{Pr}(Y=y)=lambda_y$.

Now, suppose we got our box, used it, and obtained the result $x$. Knowing $lambda,p_0,p_1$, what’s our best guess for $y$? Well, we know that $mathrm{Pr}(X=x|Y=y)=p_y(x)$, and thus from Bayes that
$$mathrm{Pr}(Y=y|X=x) = frac{lambda_y p_y(x)}{p(x)}, quad text{where}quad p(x)equiv sum_y lambda_y p_y(x).$$
It then stands to reason that, knowing $x$, our best guess for $y$ is the one that maximises this probability:
$$y_{rm opt}(x)=mathrm{argmax}_ymathrm{Pr}(Y=y|X=x)=mathrm{argmax}_y lambda_y p_y(x).$$
Using such guess, the probability of guessing right (using the knowledge of $x$) would then be
$$p_{rm correct}(x) = max_y mathrm{Pr}(Y=y|X=x) = frac{max_y lambda_y p_y(x)}{p(x)}.$$

We would now like to know what is the probability of correctly identifying $y$ without having sampled from $p_y$. Naively, I would compute this as the average of $p_{rm correct}(x)$ weighted over the probabilities $p(x)$, which would give
$$p_{rm correct} = sum_x p(x)p_{rm correct}(x) = sum_x max_y lambda_y p_y(x).tag A$$
However, e.g. in Watrous’ book (around Eq. (3.2), pag 126, though I’m using a slightly different notation here), the author mentions that the probability of correctly identifying $y$ oughts to be calculated by considering the probability that the guess is right minus the probability that it is not, and that this would result in the quantity
$$sum_{xin X}leftlvert lambda_0 p_0(x)- lambda_1 p_1(x)rightrvert
= |lambda_0 p_0 – lambda_1 p_1|_1,tag B$$

from which we would recover the probability of being correct as $p_{rm correct}=frac12(1+|lambda_0 p_0-lambda_1 p_1|_1)$.

Assuming the reasoning that led me to (A) is correct, why is my expression for $p_{rm correct}$ compatible with the one resulting from (B)?

One Answer

Upon some additional reflection, I found a way to see that the expressions are equivalent. I'm not sure whether this is the way that was intended in the text though, there might be better/more direct ways to get to the expression with the trace norm.

The basic idea is to observe that for any $a,binmathbb R$ we have $$(a+b)+|a-b| = 2max(a,b).$$ Using this result, we can write $$2max_y lambda_y p_y(x) = sum_y lambda_y p_y(x) + lvertlambda_0 p_0(x)-lambda_1 p_1(x)rvert.$$ Summing over $x$ we thus get $$2p_{rm correct} = 2sum_x max_y lambda_y p_y(x) = underbrace{sum_x sum_y lambda_y p_y(x)}_{=1} + sum_x lvertlambda_0 p_0(x)-lambda_1 p_1(x)rvert,$$ and therefore $$p_{rm correct} = frac12left(1 + |lambda_0 p_0 - lambda_1 p_1|_1right).$$

Answered by glS on June 24, 2021

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP