TransWikia.com

Why is the mean used to compute the expectation in the GAN loss?

Artificial Intelligence Asked by A is for Ambition on February 2, 2021

From Goodfellow et al. (2014), we have the adversarial loss:

$$ min_G , max_D V (D, G) = mathbb{E}_{x∼p_{data}(x)} , [log , D(x)] \
quadquadquadquadquadquadquad + , mathbb{E}_{z∼p_z(z)} , [log , (1 − D(G(z)))] , text{.} quad$$

In practice, the expectation is computed as a mean over the minibatch. For example, the discriminator loss is:

$$
nabla_{theta_{d}} frac{1}{m} sum_{i=1}^{m}left[log Dleft(boldsymbol{x}^{(i)}right)+log left(1-Dleft(Gleft(boldsymbol{z}^{(i)}right)right)right)right]
$$

My question is: why is the mean used to compute the expectation? Does this imply that $p_{data}$ is uniformly distributed, since every sample must be drawn from $p_{data}$ with equal probability?

The expectation, expressed as an integral, is:

$$
begin{aligned}
V(G, D) &=int_{boldsymbol{x}} p_{text {data }}(boldsymbol{x}) log (D(boldsymbol{x})) d x+int_{boldsymbol{z}} p_{boldsymbol{z}}(boldsymbol{z}) log (1-D(g(boldsymbol{z}))) d z \
&=int_{boldsymbol{x}} p_{text {data }}(boldsymbol{x}) log (D(boldsymbol{x}))+p_{g}(boldsymbol{x}) log (1-D(boldsymbol{x})) d x
end{aligned}
$$

So, how do we go from an integral involving a continuous distribution to summing over discrete probabilities, and further, that all those probabilities are the same?

The best I could find from other StackExchange posts is that the mean is just an approximation, but I’d really like a more rigorous explanation.

This question isn’t exclusive to GANs, but is applicable to any loss function that is expressed mathematically as an expectation over some sampled distribution, which is not implemented directly via the integral form.

(All equations are from the Goodfellow paper.)

One Answer

It seems your question is concerned with how an empirical mean works.

It is indeed true that, if all $x^{(i)}$ are independent identically distributed realisations of a random variable $X$, then $lim_{n rightarrow infty} frac{1}{n}sum_{i=1}^n f(x^{(i)}) = mathbb{E}[f(X)]$. This is a standard result in statistics known as the law of large numbers.

Answered by David Ireland on February 2, 2021

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP