Policy gradient/REINFORCE algorithm with RNN: why does this converge with SGM but not Adam?

Data Science Asked by Kechen on December 23, 2020

I am working on training RNN model on caption generation with REINFORCE algorithm. I adopt self-critic strategy (see paper Self-critical Sequence Training for Image Captioning) to reduce the variance. I initialize the model with a pre-trained RNN model (a.k.a. warm start). This pre-trained model (trained with log-likelihood objective) got 0.6 F1 score in my task.

When I use adam optimizer to train this policy gradient objective, the performance of my model drops to 0 after a few epochs. However, if I switch to gradientdescent optimizer and keep everything else the same, the performance looks reasonable and slightly better than the pre-trained model. Is there any idea why is that?

I use tensorflow to implement my model.

One Answer

Without the code there's not much we can do but, I'd guess you need to significantly lower the learning rate. From my experience Adam requires a significantly lower learning rate compared to SGD.

Answered by Ran Elgiser on December 23, 2020

Add your own answers!

Related Questions

Extract segment from document scan

1  Asked on September 25, 2020 by feeeper


Deep learning basics

6  Asked on September 25, 2020 by maxi


Orange Import and process spectra time dependent/kinetics

1  Asked on September 21, 2020 by natalie-benbow


Ask a Question

Get help from others!

© 2022 All rights reserved. Sites we Love: PCI Database, MenuIva, UKBizDB, Menu Kuliner, Sharing RPP