Euclidean distance score and similarity

Question

I'm just working with the book Collective Intelligence (by Toby Segaran) and came across the Euclidean distance score. In the book the author shows how to calculate the similarity between  two recommendation arrays (i.e. $textrm{person} times textrm{movie} mapsto textrm{score})$ .

He calculates the Euclidean distance for two persons $p_1$ and $p_2$ by
$$d(p_1, p_2) = sqrt{sum_{i~in~textrm{item}} (s_{p_1} - s_{p_2})^2} $$

This makes completely sense to me. What I don't really understand is why he calculates at the end the following to get a "distance based similarity":

$$ frac{1}{1 + d(p_1, p_2)} $$

So, I somehow get that this must be the conversion from a distance to a similarity (right?). But why does the formular looks like this? Can someone explain that?

Peter Flom · Accepted Answer

The inverse is to change from distance to similarity.

The 1 in the denominator is to make it so that the maximum value is 1 (if the distance is 0).

The square root - I am not sure. If distance is usually larger than 1, the root will make large distances less important; if distance is less than 1, it will make large distances more important.

vinesia yolanda · Answer

Euclidean is basically calculate the dissimilarity of two vectors, because it'll return 0 if two vectors are similar. While Cosine Similarity gives 1 in return to similarity. Somewhat the writer on that book wants a similarity-based measure, but he wants to use Euclidean. So, in order to get a similarity-based distance, he flipped the formula and added it with 1, so that it gives 1 when two vectors are similar. Go give it a check, try it with 2 vectors contain same values.

Answered by vinesia yolanda on September 24, 2020

user10009133 · Answer

As you mentioned you know the calculation of Euclidence distance so I am explaining second formula.

Euclidean formula calculates the distance, which will be smaller for people or items who are more similar. Like if they are same then distance is 0 and totally different then higher than 0.

However we need a function that gives a higher value of they are similar. This can be done by adding 1 to the function(so you don't get a division-by-zero error) and inverting it. Like if distence 0 and similarity score 1/1=1

Claudio Martines · Answer

To measure the distance and similarity (in the semantic sense) the first thing to check is if you are moving in a Euclidean space or not. An empirical way to verify this is to estimate the distance of a pair of values for which you know the meaning.

Answered by Claudio Martines on September 24, 2020

Euclidean distance score and similarity

4 Answers

Add your own answers!

Ask a Question