What does the term $|mathcal{A}(s)|$ mean in the $epsilon$-greedy policy?

Artificial Intelligence Asked by Metrician on August 24, 2021

I’ve been looking online for a while for a source that explains these computations but I can’t find anywhere what does the $$|A(s)|$$ mean. I guess $$A$$ is the action set but I’m not sure about that notation:

$$frac{varepsilon}{|mathcal{A}(s)|} sum_{a} Q^{pi}(s, a)+(1-varepsilon) max _{a} Q^{pi}(s, a)$$

Here is the source of the formula.

I also want to clarify that I understand the idea behind the $$epsilon$$-greedy approach and the motivation behind the on-policy methods. I just had a problem understanding this notation (and also some other minor things). The author there omitted some stuff, so I feel like there was a continuity jump, which is why I didn’t get the notation, etc. I’d be more than glad if I can be pointed towards a better source where this is detailed.

This expression: $$|mathcal{A}(s)|$$ means

• $$|quad|$$ the size of

• $$mathcal{A}(s)$$ the set of actions in state $$s$$

or more simply the number of actions allowed in the state.

This makes sense in the given formula because $$frac{epsilon}{|mathcal{A}(s)|}$$ is then the probability of taking each exploratory action in an $$epsilon$$-greedy policy. The overall expression is the expected return when following that policy, summing expected results from the exploratory and greedy action.

Correct answer by Neil Slater on August 24, 2021

Related Questions

Is my pseudocode titled “Monte Carlo Exploring Starts (with model)” correct?

0  Asked on February 8, 2021

Measuring novel configuration of points

1  Asked on February 7, 2021 by vaibhav-thakkar

Generation of ‘new log probabilities’ in continuous action space PPO

1  Asked on February 5, 2021 by gideon

What is the order of execution of steps in back-propagation algorithm in a neural network?

1  Asked on February 4, 2021 by gokul

0  Asked on February 3, 2021 by maciek-woniak

Why is the mean used to compute the expectation in the GAN loss?

1  Asked on February 2, 2021 by a-is-for-ambition

Computation of initial adjoint for NODE

1  Asked on January 28, 2021 by seewoo-lee

Advantage Actor Critic model implementation with Tensorflowjs

1  Asked on January 28, 2021 by sergiu-ionescu

How to frame this problem using RL?

0  Asked on January 27, 2021 by blue-sky

Train 3D object detection model for custom object

0  Asked on January 22, 2021

Is there Binary Zero-Shot Learning with no defined prototypes for the unseen class?

0  Asked on January 22, 2021 by ddaedalus

How can I generate natural language sentences given logical structures that contain the subject, verb and target?

2  Asked on January 21, 2021 by onza

How are Artificial Neural Networks and the Biological Neural Networks similar and different?

3  Asked on January 20, 2021 by andreas-storvik-strauman

In GradCAM, why is activation strength considered an indicator of relevant regions?

1  Asked on January 17, 2021

Is there any artificially intelligent system that really mimics human intelligence?

3  Asked on January 14, 2021 by curious-g

Why scaling down the parameter many times during training will help the learning speed be the same for all weights in Progressive GAN?

0  Asked on January 10, 2021 by toby

How could we solve the TSP using a hill-climbing approach?

1  Asked on January 6, 2021 by dua-fatima

What’s the purpose of layers without biases?

1  Asked on January 1, 2021 by mark-mark

How to avoid rapid actuator movements in favor of smooth movements in a continuous space and action space problem?

1  Asked on December 28, 2020 by opt12