TransWikia.com

Pearson Product Moment Correlation vs Cosine Similarity For Encoded Text Comparison

Data Science Asked on January 13, 2021

I’ve seen a few different examples of the implementation of Google’s Sentence Encoders. Many of these use different methods to find the similarity between sentences.

For example, the standard Universal Sentence Encoder notebook implements np.inner() which calculates the Pearson Product Moment Correlation if I understand properly (please correct me if I’m wrong). It normalizes the vector first and then performs the dot product. https://colab.research.google.com/github/tensorflow/hub/blob/master/examples/colab/semantic_similarity_with_tf_hub_universal_encoder.ipynb

The Universal Sentence Encoder notebook implements, however, implements cosine similarity when comparing how similar two words/sentences are. https://colab.research.google.com/github/tensorflow/hub/blob/master/examples/colab/cross_lingual_similarity_with_tf_hub_multilingual_universal_encoder.ipynb

In the couple examples I’ve tried so far, the np.inner() method seems to be more effective although this is very anecdotal. Specifically, I’d like to get a better idea of what the difference between normalized dot product and cosine similarity is and when each should be used.

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP