Calculating an estimate of KL Divergence using the samples drawn from distributions

Question

Given two sets of samples drawn from two different distributions, is it computationally possible to get an estimate of KL-Divergence between the two distribution using these samples?
Here I am assuming the dimensionality of the two distributions is high (say d). To compute the estimate, we first need to discretize the entire space and then estimate probabilities based on the frequencies. Let us say, we discretize each dimension into p bins. Then the total number of grids in the space will be $p^d$. So we need to compute the probabilities of the two distributions for $p^d$ grids, which is exponential in time. Hence I assume we cannot compute an estimate of KL Divergence using the samples for any practical problem.
I wanted to check if this explanation is correct or if I am missing anything. Could someone assert if this rationale is correct?

Carlos Pinzón · Answer

Check this article. They use k-NN to interpolate the values of P(x) and Q(x), so that you can use the KL-divergence formula with 'approximated histograms'.

Brian Spiering · Answer

There is no need to discretize the space since KL divergence can be calculated for continuous spaces.
Yes - you can calculate the difference between samples using KL divergence.
Based on differences between samples, estimating a possible difference in populations is the core of statistical inference. It is a very complex issue.

Calculating an estimate of KL Divergence using the samples drawn from distributions

2 Answers

Add your own answers!

Ask a Question