In Deep Deterministic Policy Gradient, are all weights of the policy network updated with the same or different value?

Question

I'm trying to understand the DDPG algorithm shown at this page. I don't know what should the result of the gradient at step 14 be.

Is it a scalar that I have to use to update all the weights (so all weights are updated with the same value)? Or is it a list with a different values to use for updating for each weight? I'm used to working with loss functions and an $y$ target, but here I don't have them so I'm quite confused.

harwiltz · Answer

Each Q output is a scalar, so the sum of all those is a scalar. Thus, you're taking a gradient wrt your parameters of a scalar. The result is a vector with one entry per parameter.

In Deep Deterministic Policy Gradient, are all weights of the policy network updated with the same or different value?

One Answer

Add your own answers!

Ask a Question