# What does the term $|mathcal{A}(s)|$ mean in the $epsilon$-greedy policy?

Artificial Intelligence Asked by Metrician on August 24, 2021

I’ve been looking online for a while for a source that explains these computations but I can’t find anywhere what does the $$|A(s)|$$ mean. I guess $$A$$ is the action set but I’m not sure about that notation:

$$frac{varepsilon}{|mathcal{A}(s)|} sum_{a} Q^{pi}(s, a)+(1-varepsilon) max _{a} Q^{pi}(s, a)$$

Here is the source of the formula.

I also want to clarify that I understand the idea behind the $$epsilon$$-greedy approach and the motivation behind the on-policy methods. I just had a problem understanding this notation (and also some other minor things). The author there omitted some stuff, so I feel like there was a continuity jump, which is why I didn’t get the notation, etc. I’d be more than glad if I can be pointed towards a better source where this is detailed.

This expression: $$|mathcal{A}(s)|$$ means

• $$|quad|$$ the size of

• $$mathcal{A}(s)$$ the set of actions in state $$s$$

or more simply the number of actions allowed in the state.

This makes sense in the given formula because $$frac{epsilon}{|mathcal{A}(s)|}$$ is then the probability of taking each exploratory action in an $$epsilon$$-greedy policy. The overall expression is the expected return when following that policy, summing expected results from the exploratory and greedy action.

Correct answer by Neil Slater on August 24, 2021

## Related Questions

### Is my pseudocode titled “Monte Carlo Exploring Starts (with model)” correct?

0  Asked on February 8, 2021

### Measuring novel configuration of points

1  Asked on February 7, 2021 by vaibhav-thakkar

### Generation of ‘new log probabilities’ in continuous action space PPO

1  Asked on February 5, 2021 by gideon

### What is the order of execution of steps in back-propagation algorithm in a neural network?

1  Asked on February 4, 2021 by gokul

### Why do we add additional axis in CNN autoencoder while denoising?

0  Asked on February 3, 2021 by maciek-woniak

### Why is the mean used to compute the expectation in the GAN loss?

1  Asked on February 2, 2021 by a-is-for-ambition

### Computation of initial adjoint for NODE

1  Asked on January 28, 2021 by seewoo-lee

### Advantage Actor Critic model implementation with Tensorflowjs

1  Asked on January 28, 2021 by sergiu-ionescu

### How to frame this problem using RL?

0  Asked on January 27, 2021 by blue-sky

### Train 3D object detection model for custom object

0  Asked on January 22, 2021

### Is there Binary Zero-Shot Learning with no defined prototypes for the unseen class?

0  Asked on January 22, 2021 by ddaedalus

### How can I generate natural language sentences given logical structures that contain the subject, verb and target?

2  Asked on January 21, 2021 by onza

### How are Artificial Neural Networks and the Biological Neural Networks similar and different?

3  Asked on January 20, 2021 by andreas-storvik-strauman

### In GradCAM, why is activation strength considered an indicator of relevant regions?

1  Asked on January 17, 2021

### Is there any artificially intelligent system that really mimics human intelligence?

3  Asked on January 14, 2021 by curious-g

### Why scaling down the parameter many times during training will help the learning speed be the same for all weights in Progressive GAN?

0  Asked on January 10, 2021 by toby

### How could we solve the TSP using a hill-climbing approach?

1  Asked on January 6, 2021 by dua-fatima

### What’s the purpose of layers without biases?

1  Asked on January 1, 2021 by mark-mark

### How to avoid rapid actuator movements in favor of smooth movements in a continuous space and action space problem?

1  Asked on December 28, 2020 by opt12