TransWikia.com

Similarity between 2 statistical distributions

Data Science Asked on June 23, 2021

Is there any index that measures similarity between 2 gaussian distributions of 1-D data (may have slightly different number of points) considering their mean shift, variance shift, difference in shapes(like one is symmetric and the other is skewed) etc. and gives similarity between [0,1]?

I am using Hedges’ index for the same but it does not give a similarity index between 0 and 1. It can be greater than 1 as well, so it is difficult to interpret it.

Also, no pattern of the data is known beforehand, if it helps in any way for the answer.

One Answer

One method is Kolmogorov-Smirnov test. Kolmogorov-Smirnov test checks whether two samples are drawn from the same continuous distribution where sample sizes can be different. It's p-value is close to 0 when two samples follow the same distribution and close to 1 when they do not follow the same distribution. So you can use 1 - (p-value) as a similarity metric.

import numpy as np
from scipy.stats import ks_2samp

np.random.seed(52)

n1 = 200
n2 = 300

mu_1 = 5
mu_2 = 5.1

sigma_1 = 0.3
sigma_2 = 0.2


sample_1 = np.random.normal(mu_1, sigma_1, n1)
sample_2 = np.random.normal(mu_2, sigma_2, n2)

result = ks_2samp(sample_1, sample_2)

print(result.pvalue)

1.4998994601889137e-08

Note that there are also other methods such as Bhattacharyya distance, Kullback–Leibler divergence. Some implementations for Kullback-Leibner can be found also here.

Answered by Orkun Berk Yuzbasioglu on June 23, 2021

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP