How does Q-Learning deal with mixed strategies?

Question

I'm trying to understand how Q-learning deals with games where the optimal policy is a mixed strategy. The Bellman equation says that you should choose $max_a(Q(s,a))$ but this implies a single unique action for each $s$. Is Q-learning just not appropriate if you believe that the problem has a mixed strategy?

Robin Nicole · Answer

One possibility is to use softmax and choose each action a randomly with probabiliy $p = frac{exp(Q(s,a))}{sum_a exp(Q(s,a))}$. I don't thinks it is still Q-learning though.

How does Q-Learning deal with mixed strategies?

One Answer

Add your own answers!

Ask a Question