TransWikia.com

Multivariate normal density in Python?

Stack Overflow Asked by Benno on December 23, 2021

Is there any python package that allows the efficient computation of the PDF (probability density function) of a multivariate normal distribution?

It doesn’t seem to be included in Numpy/Scipy, and surprisingly a Google search didn’t turn up any useful thing.

10 Answers

Here I elaborate a bit more on how exactly to use the multivariate_normal() from the scipy package:

# Import packages
import numpy as np
from scipy.stats import multivariate_normal

# Prepare your data
x = np.linspace(-10, 10, 500)
y = np.linspace(-10, 10, 500)
X, Y = np.meshgrid(x,y)

# Get the multivariate normal distribution
mu_x = np.mean(x)
sigma_x = np.std(x)
mu_y = np.mean(y)
sigma_y = np.std(y)
rv = multivariate_normal([mu_x, mu_y], [[sigma_x, 0], [0, sigma_y]])

# Get the probability density
pos = np.empty(X.shape + (2,))
pos[:, :, 0] = X
pos[:, :, 1] = Y
pd = rv.pdf(pos)

Answered by tsveti_iko on December 23, 2021

You can easily compute using numpy. I have implemented as below for the purpose of machine learning course and would like to share, hope it helps to someone.

import numpy as np
X = np.array([[13.04681517, 14.74115241],[13.40852019, 13.7632696 ],[14.19591481, 15.85318113],[14.91470077, 16.17425987]])

def est_gaus_par(X):
    mu = np.mean(X,axis=0)
    sig = np.std(X,axis=0)
    return mu,sig

mu,sigma = est_gaus_par(X)

def est_mult_gaus(X,mu,sigma):
    m = len(mu)
    sigma2 = np.diag(sigma)
    X = X-mu.T
    p = 1/((2*np.pi)**(m/2)*np.linalg.det(sigma2)**(0.5))*np.exp(-0.5*np.sum(X.dot(np.linalg.pinv(sigma2))*X,axis=1))

    return p

p = est_mult_gaus(X, mu, sigma)

Answered by Freq on December 23, 2021

The following code helped me to solve,when given a vector what is the likelihood that vector is in a multivariate normal distribution.

import numpy as np
from scipy.stats import multivariate_normal

data with all vectors

d= np.array([[1,2,1],[2,1,3],[4,5,4],[2,2,1]])

mean of the data in vector form, which will have same length as input vector(here its 3)

mean = sum(d,axis=0)/len(d)

OR
mean=np.average(d , axis=0)
mean.shape

finding covarianve of vectors which will have shape of [input vector shape X input vector shape] here it is 3x3

cov = 0
for e in d:
  cov += np.dot((e-mean).reshape(len(e), 1), (e-mean).reshape(1, len(e)))
cov /= len(d)
cov.shape

preparing a multivariate Gaussian distribution from mean and co variance

dist = multivariate_normal(mean,cov)

finding probability distribution function.

print(dist.pdf([1,2,3]))

3.050863384798471e-05

The above value gives the likelihood.

Answered by Shahad on December 23, 2021

If still needed, my implementation would be

import numpy as np

def pdf_multivariate_gauss(x, mu, cov):
    '''
    Caculate the multivariate normal density (pdf)

    Keyword arguments:
        x = numpy array of a "d x 1" sample vector
        mu = numpy array of a "d x 1" mean vector
        cov = "numpy array of a d x d" covariance matrix
    '''
    assert(mu.shape[0] > mu.shape[1]), 'mu must be a row vector'
    assert(x.shape[0] > x.shape[1]), 'x must be a row vector'
    assert(cov.shape[0] == cov.shape[1]), 'covariance matrix must be square'
    assert(mu.shape[0] == cov.shape[0]), 'cov_mat and mu_vec must have the same dimensions'
    assert(mu.shape[0] == x.shape[0]), 'mu and x must have the same dimensions'
    part1 = 1 / ( ((2* np.pi)**(len(mu)/2)) * (np.linalg.det(cov)**(1/2)) )
    part2 = (-1/2) * ((x-mu).T.dot(np.linalg.inv(cov))).dot((x-mu))
    return float(part1 * np.exp(part2))

def test_gauss_pdf():
    x = np.array([[0],[0]])
    mu  = np.array([[0],[0]])
    cov = np.eye(2) 

    print(pdf_multivariate_gauss(x, mu, cov))

    # prints 0.15915494309189535

if __name__ == '__main__':
    test_gauss_pdf()

In case I make future changes, the code is here on GitHub

Answered by user2489252 on December 23, 2021

The multivariate normal is now available on SciPy 0.14.0.dev-16fc0af:

from scipy.stats import multivariate_normal
var = multivariate_normal(mean=[0,0], cov=[[1,0],[0,1]])
var.pdf([1,0])

Answered by juliohm on December 23, 2021

I use the following code which calculates the logpdf value, which is preferable for larger dimensions. It also works for scipy.sparse matrices.

import numpy as np
import math
import scipy.sparse as sp
import scipy.sparse.linalg as spln

def lognormpdf(x,mu,S):
    """ Calculate gaussian probability density of x, when x ~ N(mu,sigma) """
    nx = len(S)
    norm_coeff = nx*math.log(2*math.pi)+np.linalg.slogdet(S)[1]

    err = x-mu
    if (sp.issparse(S)):
        numerator = spln.spsolve(S, err).T.dot(err)
    else:
        numerator = np.linalg.solve(S, err).T.dot(err)

    return -0.5*(norm_coeff+numerator)

Code is from pyParticleEst, if you want the pdf value instead of the logpdf just take math.exp() on the returned value

Answered by ajn on December 23, 2021

I just made one for my purposes so I though I'd share. It's built using "the powers" of numpy, on the formula of the non degenerate case from http://en.wikipedia.org/wiki/Multivariate_normal_distribution and it aso validates the input.

Here is the code along with a sample run

from numpy import *
import math
# covariance matrix
sigma = matrix([[2.3, 0, 0, 0],
           [0, 1.5, 0, 0],
           [0, 0, 1.7, 0],
           [0, 0,   0, 2]
          ])
# mean vector
mu = array([2,3,8,10])

# input
x = array([2.1,3.5,8, 9.5])

def norm_pdf_multivariate(x, mu, sigma):
    size = len(x)
    if size == len(mu) and (size, size) == sigma.shape:
        det = linalg.det(sigma)
        if det == 0:
            raise NameError("The covariance matrix can't be singular")

        norm_const = 1.0/ ( math.pow((2*pi),float(size)/2) * math.pow(det,1.0/2) )
        x_mu = matrix(x - mu)
        inv = sigma.I        
        result = math.pow(math.e, -0.5 * (x_mu * inv * x_mu.T))
        return norm_const * result
    else:
        raise NameError("The dimensions of the input don't match")

print norm_pdf_multivariate(x, mu, sigma)

Answered by user692734 on December 23, 2021

The density can be computed in a pretty straightforward way using numpy functions and the formula on this page: http://en.wikipedia.org/wiki/Multivariate_normal_distribution. You may also want to use the likelihood function (log probability), which is less likely to underflow for large dimensions and is a little more straightforward to compute. Both just involve being able to compute the determinant and inverse of a matrix.

The CDF, on the other hand, is an entirely different animal...

Answered by Andrew Mao on December 23, 2021

I know of several python packages that use it internally, with different generality and for different uses, but I don't know if any of them are intended for users.

statsmodels, for example, has the following hidden function and class, but it's not used by statsmodels:

https://github.com/statsmodels/statsmodels/blob/master/statsmodels/miscmodels/try_mlecov.py#L36

https://github.com/statsmodels/statsmodels/blob/master/statsmodels/sandbox/distributions/mv_normal.py#L777

Essentially, if you need fast evaluation, rewrite it for your use case.

Answered by Josef on December 23, 2021

In the common case of a diagonal covariance matrix, the multivariate PDF can be obtained by simply multiplying the univariate PDF values returned by a scipy.stats.norm instance. If you need the general case, you will probably have to code this yourself (which shouldn't be hard).

Answered by Sven Marnach on December 23, 2021

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP