# What is the expectation of an empirical model in model based RL?

Artificial Intelligence Asked by ijuneja on November 4, 2021

In the paper – "Action Elimination and Stopping Conditions for the Multi-Armed Bandit and Reinforcement Learning Problems", on page 1083, on the 6th line from the bottom, the authors define expectation of the empirical model as
$$hat{mathbb{E}}_{s,s’,a}[V(s’)] = sum_{s’ in S} hat{P}^{a}_{s, s’}V(s’).$$
I didn’t understand the significance of this quantity since it puts $$V(s’)$$ inside an expectation while assuming the knowledge of $$V(s’)$$ in the definition on the right.

A clarification in this regard would be appreciated.

EDIT:
The paper defines $$hat{P}^{a}_{s, s’}$$ as,
$$hat{P}^{a}_{s, s’} = frac{|(s, a, s’, t)|}{|(s, a, t)|}.$$
Where $$|(s, a, t)|$$ is the number of times state $$s$$ was visited and action $$a$$ was taken and $$|(s, a, s’, t)|$$ as the number of times among the $$|(s, a, t)|$$ times $$(s, a)$$ was visited when the next state landed in was $$s’$$ during model learning.

No explicit definition for $$V$$ is provided however, $$V^{pi}$$ is defined as the usual expected discounted return, using the same definition as Sutton and Barto or other sources.

If I understand your question correctly, the significance of this is due to the fact that $$s'$$ is random. In the RHS of the equation it is assumed that $$V(cdot)$$ is known for each state, but the quantity is measuring the expected value of the next state given the current state and action.

Answered by harwiltz on November 4, 2021

## Related Questions

### Is my pseudocode titled “Monte Carlo Exploring Starts (with model)” correct?

0  Asked on February 8, 2021

### Measuring novel configuration of points

1  Asked on February 7, 2021 by vaibhav-thakkar

### Generation of ‘new log probabilities’ in continuous action space PPO

1  Asked on February 5, 2021 by gideon

### What is the order of execution of steps in back-propagation algorithm in a neural network?

1  Asked on February 4, 2021 by gokul

### Why do we add additional axis in CNN autoencoder while denoising?

0  Asked on February 3, 2021 by maciek-woniak

### Why is the mean used to compute the expectation in the GAN loss?

1  Asked on February 2, 2021 by a-is-for-ambition

### Computation of initial adjoint for NODE

1  Asked on January 28, 2021 by seewoo-lee

### Advantage Actor Critic model implementation with Tensorflowjs

1  Asked on January 28, 2021 by sergiu-ionescu

### How to frame this problem using RL?

0  Asked on January 27, 2021 by blue-sky

### Train 3D object detection model for custom object

0  Asked on January 22, 2021

### Is there Binary Zero-Shot Learning with no defined prototypes for the unseen class?

0  Asked on January 22, 2021 by ddaedalus

### How can I generate natural language sentences given logical structures that contain the subject, verb and target?

2  Asked on January 21, 2021 by onza

### How are Artificial Neural Networks and the Biological Neural Networks similar and different?

3  Asked on January 20, 2021 by andreas-storvik-strauman

### In GradCAM, why is activation strength considered an indicator of relevant regions?

1  Asked on January 17, 2021

### Is there any artificially intelligent system that really mimics human intelligence?

3  Asked on January 14, 2021 by curious-g

### Why scaling down the parameter many times during training will help the learning speed be the same for all weights in Progressive GAN?

0  Asked on January 10, 2021 by toby

### How could we solve the TSP using a hill-climbing approach?

1  Asked on January 6, 2021 by dua-fatima

### What’s the purpose of layers without biases?

1  Asked on January 1, 2021 by mark-mark

### How to avoid rapid actuator movements in favor of smooth movements in a continuous space and action space problem?

1  Asked on December 28, 2020 by opt12