TransWikia.com

In Deep Deterministic Policy Gradient, are all weights of the policy network updated with the same or different value?

Artificial Intelligence Asked by unter_983 on November 29, 2021

I’m trying to understand the DDPG algorithm shown at this page. I don’t know what should the result of the gradient at step 14 be.

enter image description here

Is it a scalar that I have to use to update all the weights (so all weights are updated with the same value)? Or is it a list with a different values to use for updating for each weight? I’m used to working with loss functions and an $y$ target, but here I don’t have them so I’m quite confused.

One Answer

Each Q output is a scalar, so the sum of all those is a scalar. Thus, you're taking a gradient wrt your parameters of a scalar. The result is a vector with one entry per parameter.

Answered by harwiltz on November 29, 2021

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP