TransWikia.com

Euclidean distance score and similarity

Cross Validated Asked by navige on September 24, 2020

I’m just working with the book Collective Intelligence (by Toby Segaran) and came across the Euclidean distance score. In the book the author shows how to calculate the similarity between two recommendation arrays (i.e. $textrm{person} times textrm{movie} mapsto textrm{score})$ .

He calculates the Euclidean distance for two persons $p_1$ and $p_2$ by
$$d(p_1, p_2) = sqrt{sum_{i~in~textrm{item}} (s_{p_1} – s_{p_2})^2} $$

This makes completely sense to me. What I don’t really understand is why he calculates at the end the following to get a “distance based similarity”:

$$ frac{1}{1 + d(p_1, p_2)} $$

So, I somehow get that this must be the conversion from a distance to a similarity (right?). But why does the formular looks like this? Can someone explain that?

4 Answers

The inverse is to change from distance to similarity.

The 1 in the denominator is to make it so that the maximum value is 1 (if the distance is 0).

The square root - I am not sure. If distance is usually larger than 1, the root will make large distances less important; if distance is less than 1, it will make large distances more important.

Correct answer by Peter Flom on September 24, 2020

Euclidean is basically calculate the dissimilarity of two vectors, because it'll return 0 if two vectors are similar. While Cosine Similarity gives 1 in return to similarity. Somewhat the writer on that book wants a similarity-based measure, but he wants to use Euclidean. So, in order to get a similarity-based distance, he flipped the formula and added it with 1, so that it gives 1 when two vectors are similar. Go give it a check, try it with 2 vectors contain same values.

Answered by vinesia yolanda on September 24, 2020

As you mentioned you know the calculation of Euclidence distance so I am explaining second formula.

Euclidean formula calculates the distance, which will be smaller for people or items who are more similar. Like if they are same then distance is 0 and totally different then higher than 0.

However we need a function that gives a higher value of they are similar. This can be done by adding 1 to the function(so you don't get a division-by-zero error) and inverting it. Like if distence 0 and similarity score 1/1=1

Answered by user10009133 on September 24, 2020

To measure the distance and similarity (in the semantic sense) the first thing to check is if you are moving in a Euclidean space or not. An empirical way to verify this is to estimate the distance of a pair of values ​​for which you know the meaning.

Answered by Claudio Martines on September 24, 2020

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP