Correlation between 2 random variables and $cos(theta)$ on vector space

Question

I am studying from a book that asserts that the correlation between random variables $X$ and $Y$ is 0.8. And apparently this means that if $X$ and $Y$ are represented on a vector space, with angle $theta$ between them, then $cos(theta) = 0.8$.
I don't understand why this is true if $X$ and $Y$ are not centered random variables. For example, what if we consider random variables $A$ and $B$ such that $A$ can be represented by the vector $[0   1]^T$ and $B$ by $[1   0]^T$. They are clearly orthogonal with $theta = frac{pi}{2}$, so according to the logic in the first paragraph, the correlation should be zero.
But the correlation isn't zero because the covariance isn't zero. The covariance is
$$
text{cov}(A,B) = E[(A - mu_A)(B - mu_B)] \
= frac{1}{2}sum_{i=1}^2 (a_i - mu_A)(b_i - mu_B) \
= frac{1}{2}left[ -0.5 cdot 0.5  +  0.5cdot-0.5  right] \
= -0.25
$$
So based on this simple example, it seems $rho_{AB} neq cos(theta_{AB})$

Here is a screenshot of the part of the book that I was referring to above. I think everything in the screenshot is wrong unless those random variables have ZERO mean.

Anatoly · Accepted Answer

As already noted in the comments, the concepts of cosine similarity and correlation are different. In particular, as explained below, the cosine of the angle between two vectors can be considered equivalent to the correlation coefficient only if the random variables have zero means. This explains why two orthogonal vectors, whose cosine similarity is zero, can show some correlation, and then a covariance different from zero as in the example of the OP.
Cosine similarity is obtained by taking the inner product and dividing it by the vectors’ $L2$ norms. The formula is
$${displaystyle CS(x,y) ={frac {sum limits _{i=1}^{n}{x_{i}x_{i}}}{{sqrt {sum limits _{i=1}^{n}{x_{i}^{2}}}}{sqrt {sum limits _{i=1}^{n}{y_{i}^{2}}}}}}=   {langle x,y rangle over | x ||{y} |}   }$$
and corresponds to the cosine of the angle between the two vectors.
Cosine similarity is bounded between $-1$ and $1$. However, in most applications where this measure is used, the vectors are non-negative, so in these cases it ranges between $0$ and $1$. Importantly, cosine similarity is invariant to scaling (i.e. multiplying all terms by a nonzero constant) but is not invariant to shifts (i.e. adding a constant to all terms).
On the other hand, correlation can be seen as the cosine similarity measured between the centered versions of the two vectors. In fact, indicating with $overline{x}$ and $overline{y}$ the means, we have
$${displaystyle r(x,y) ={frac {sum limits _{i=1}^{n}({x_{i}-overline{x})(y_{i}-   overline{y}  ) }}{{sqrt {sum limits _{i=1}^{n}{  (x_{i}-overline{x})  ^{2}}}}{sqrt {sum limits _{i=1}^{n}{ 
  (y_{i}-overline{y})^{2}}}}}}} =  {langle x-overline{x}, ,y -overline{y}rangle over | x-overline{x} ||{y}-overline{y} |}   $$
and then
$$r(x,y)=CS(x-overline{x}, ,y -overline{y})$$
It is worthy of note that correlation is bounded between $-1$ and $1$ as well, but unlike cosine similarity it is invariant to both scaling and shifts.
We conclude that the cosine similarity is equal to the correlation coefficient only when the vectors $x$ and $y$ are centered (i.e., they have zero means).

Correlation between 2 random variables and $cos(theta)$ on vector space

One Answer

Add your own answers!

Ask a Question