# What is the origin of the name "conjugate prior"?

Cross Validated Asked on January 5, 2022

I know what a conjugate prior is. But I’m confused by the name itself. Why is it called "conjugate"? A complex conjugate $$z^ast$$ has a reciprocal relationship with $$z$$, i.e., $${z^ast}^ast = z$$. But there isn’t such a reciprocal relationship between any two elements of the triad (prior, likelihood, posterior) or at least I’m not aware of it. So why "conjugate"? Is the term overloaded?

I believe the origin is somehow related to the following concepts:

• eigenvector: a vector $$mathbf{x}$$ is called an eigenvector of a matrix $$mathbf{A}$$ if $$mathbf{A}mathbf{x}$$ = $$kmathbf{x}$$ , meaning $$mathbf{A}mathbf{x}$$ has the same form as $$mathbf{x}$$ (just different by a scaling factor $$k$$ called eigenvalue of $$mathbf{A}$$), hope you start to see this is the same logic with conjugate prior.

• eigenfunction: see this analogy between conjugate prior and eigenfunction. The concept of eigenvector is extended to functions in Functional Analysis. Given a linear transformation $$L$$ (eg. a differential or integral operator), its eigenfunctions are functions $$f$$ such that $$Lf$$ is simply $$kf$$, ie. $$f$$ scaled by a scalar. The eigenfunctions are very useful in solving differential equations as they provide us with very convenient representation of their solutions. These are also related to Fourier transforms, where eigenfunctions of a Fourier transform are sine and cosine functions. In fact, it can be proved that any periodical function can be approximated as a linear combination of sine and cosine functions. Also, Fourier transform of a Gaussian function is another Gaussian function, again same logic with conjugated prior.

Answered by Victor Luu on January 5, 2022

The Oxford English Dictionary defines "conjugate" as an adjective meaning "joined together, esp. in a pair, coupled; connected, related." It's not a huge stretch to imagine that a conjugate prior has a special and strong connection to its posterior.

It's used in a similar sense in chemistry (conjugate acid/base; conjugate solution), botany (leaves that grow in pairs, especially when there's only one pair), optics (conjugate foci), and linguistics (conjugations are forms of the same root word).

While some have a "reciprocal" implication, others don't, so I don't think it's a necessary element of the meaning.

Wikipedia credits Raiffa and Schlaifer for coining the term (annoyingly, it's not in the OED). Here's the first mention of it in their 1961 book, which seems to be using the "joined" sense of conjugate. We show that whenever (1) any possible experimental outcome can be described by a sufficient statistic of fixed dimensionality (i.e., an $$s$$-tuple $$(y_1, y_2, ldots y_s)$$ where $$s$$ does not depend on the "size" of the experiment), and 2) the likelihood of every outcome is given by a reasonably simple formula with $$y_1, y_2, ldots y_s$$ as its arguments, we can obtain a very tractable family of "conjugate" prior distributions simply by interchanging the roles of variables and parameters in the algebraic expression for the sample likelihood, and the posterior distribution will be a member of the same family as the prior. "

Answered by Matt Krause on January 5, 2022

## Related Questions

### How to compare gender proportions in a population?

1  Asked on January 4, 2021 by new

### Pseudo R2 and prob>chi2

1  Asked on January 3, 2021 by nsamwa

### Saddle-free Newton method for SGD – while Newton attracts saddles, is it worth to actively replel them?

1  Asked on January 3, 2021 by jarek-duda

### Relative Error is not normally distributed

1  Asked on January 3, 2021

### Tensor product between an ispline and a bspline for fitting data that should be monotonic in one dimension

0  Asked on January 3, 2021

### Interpretation of TSA::arimax output model is presented in R

1  Asked on January 2, 2021 by wasif

### Training samples with no labels: To include or not to include?

1  Asked on January 2, 2021 by aishwarya-a-r

### Custom Loss Function – Inducing sparsity

1  Asked on January 2, 2021 by mark-f

### Belief propagation on Polytree

0  Asked on January 2, 2021 by jonasc

### Q: Dividing maximum value by minimum value and reporting the difference “in times”

0  Asked on January 2, 2021

### Hypothesis test for difference of mean when two groups have different size population

1  Asked on January 1, 2021 by ambleu

### Combining Error Terms into a General Error Term

1  Asked on January 1, 2021

### Should I delete or average repeating training inputs from a Gaussian Process?

1  Asked on December 31, 2020 by mvharen

### Does data point ordering matter in LASSO regression?

0  Asked on December 31, 2020 by rik

### Bayesian inference on mean of statistic from population

1  Asked on December 31, 2020 by helmut

### How to plot $x^{1700}(1-x)^{300}$?

3  Asked on December 30, 2020

### Relaxed Lasso Logistic Regression: Estimating second penalty parameter

2  Asked on December 30, 2020 by joanne-cheung

### Chi squared test questions

0  Asked on December 30, 2020 by woodpigeon

### QQ plot comparison of z-normalized datasets

1  Asked on December 30, 2020 by prinzvonk