TransWikia.com

Why and under what conditions does Q learning converge?

Cross Validated Asked on December 8, 2021

I am looking for a modern proof on why Q learning converges in the tabular setting.

I’ve skimmed the original proof by Dayan and Watkins and I have to say that the terminology and approach are a bit verbose and quickly lost me. I’ve also found some random lecture notes online, but I don’t really trust them. Plus a lot of these uses some martingale, filtration approach which is beyond my knowledge.

Is there any modern treatment for the convergence proof of this very important algorithm?

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP