Pearson Product Moment Correlation vs Cosine Similarity For Encoded Text Comparison

Question

I've seen a few different examples of the implementation of Google's Sentence Encoders. Many of these use different methods to find the similarity between sentences.
For example, the standard Universal Sentence Encoder notebook implements np.inner() which calculates the Pearson Product Moment Correlation if I understand properly (please correct me if I'm wrong). It normalizes the vector first and then performs the dot product. https://colab.research.google.com/github/tensorflow/hub/blob/master/examples/colab/semantic_similarity_with_tf_hub_universal_encoder.ipynb
The Universal Sentence Encoder notebook implements, however, implements cosine similarity when comparing how similar two words/sentences are. https://colab.research.google.com/github/tensorflow/hub/blob/master/examples/colab/cross_lingual_similarity_with_tf_hub_multilingual_universal_encoder.ipynb
In the couple examples I've tried so far, the np.inner() method seems to be more effective although this is very anecdotal. Specifically, I'd like to get a better idea of what the difference between normalized dot product and cosine similarity is and when each should be used.

Pearson Product Moment Correlation vs Cosine Similarity For Encoded Text Comparison

Add your own answers!

Ask a Question