TransWikia.com

How is this score function estimator derived?

Data Science Asked on December 19, 2021

In this paper they have this equation, where they use the score function estimator, to estimate the gradient of an expectation. How did they derive this?
How is this derived. Source: https://arxiv.org/pdf/1703.03864.pdf

One Answer

This is simply a special case (where $p_psi = N(0,1)$) of the general gradient estimator for Natural Evolution Strategies (proved in another reference, look it up):

enter image description here

Outline of derivation based on the general formula for the gradient estimator:

$$nabla_psi E_{theta sim p_psi} left[ F(theta) right] = E_{theta sim p_psi} left[ F(theta) nabla_psi log({p_psi}(theta)) right]$$

If

$$epsilon sim mathbb{N}(0, 1) = frac{1}{sqrt{2 pi}}e^{-frac{epsilon^2}{2}}$$

then

$$psi = theta + sigma epsilon sim mathbb{N}(theta, sigma) = frac{1}{sigmasqrt{2 pi}}e^{-frac{(psi-theta)^2}{2sigma^2}}$$

Thus: $psi = theta + sigma epsilon sim mathbb{N}(theta, sigma) Longleftrightarrow epsilon = frac{psi-theta}{sigma} sim mathbb{N}(0,1)$

So:

$$begin{align} nabla_theta E_{psi sim N(theta,sigma)} left[ F(theta + sigma epsilon) right] &= E_{psi sim N(theta,sigma)} left[ F(theta + sigma epsilon) nabla_theta (-frac{(psi-theta)^2}{2sigma^2}) right] \ &= E_{epsilon sim N(0,1)} left[ F(theta + sigma epsilon) nabla_epsilon (-frac{epsilon^2}{2}) frac{d(frac{psi-theta}{sigma})}{dtheta} right] \ &= frac{1}{sigma} E_{epsilon sim N(0,1)} left[ F(theta + sigma epsilon) epsilon right] \ &= nabla_theta E_{epsilon sim N(0,1)} left[ F(theta + sigma epsilon) right] end{align}$$

note: scalar variables were considered in above steps for simplicity, but easy to extend/derive for vector variables

Answered by Nikos M. on December 19, 2021

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP