TransWikia.com

How does Q-Learning deal with mixed strategies?

Data Science Asked by Thomas Johnson on September 29, 2020

I’m trying to understand how Q-learning deals with games where the optimal policy is a mixed strategy. The Bellman equation says that you should choose $max_a(Q(s,a))$ but this implies a single unique action for each $s$. Is Q-learning just not appropriate if you believe that the problem has a mixed strategy?

One Answer

One possibility is to use softmax and choose each action a randomly with probabiliy $p = frac{exp(Q(s,a))}{sum_a exp(Q(s,a))}$. I don't thinks it is still Q-learning though.

Answered by Robin Nicole on September 29, 2020

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP