TransWikia.com

In continuous action spaces, how is the standard deviation, associated with Gaussian distribution from which actions are sampled, represented?

Artificial Intelligence Asked by M.S. on August 24, 2021

I have a question about implementing policy gradient methods for problems with continuous action spaces.

Assume that actions are sampled from a diagonal Gaussian distribution with mean vector $mu$ and standard deviation vector $sigma$. As far as I understand, we can define a neural network that takes the current state as the input and returns a $mu$ as its output. According to OpenAI Spinning Up, the standard deviation $sigma$ can be represented in two different ways:

enter image description here

I don’t completely understand the first method. Does it mean that we must set the log standard deviations to fix numbers? Then how do we choose these numbers?

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP