Isn't a simulation a great model for model-based reinforcement learning?

Artificial Intelligence Asked by Ray Walker on August 9, 2020

Most reinforcement learning agents are trained in simulated environments. The goal is to maximize performance in (often) the same environment, preferably with a minimum amount of interactions. Having a good model of the environment allows to use planning and thus drastically improves the sample efficiency!

Why is the simulation not used for planning in these cases? It is a sampling model of the environment, right? Can’t we try multiple actions at each or some states, follow the current policy to look several steps ahead and finally choose the action with the best outcome? Shouldn’t this allow us to find better actions more quickly compared to policy gradient updates?

In this case, our environment and the model are kind of identical and this seems to be the problem. Or is the good old curse of dimensionality to blame again? Please help me figure out, what I’m missing.

Shouldn't this allow us to find better actions more quickly compared to policy gradient updates?

It depends on the nature of the simulation. If the simulation models a car as a solid body moving with three $$(x,y,theta)$$ degrees of freedom in a plane (hopefully, if it doesn't hit anything and propel vertically), the three ordinary differential equations of solid body motion can be solved quite quickly, compared to a simulation used to model the path of least resistance of a ship on wavy sea, where fluid dynamics equations must be solved, that require a huge amount of resources. OK, the response time needed for a ship is much longer, than for a car, yes, but to compute it predictively, one needs a huge amount of computational power.

Answered by tmaric on August 9, 2020

Related Questions

Is the self-attention matrix softmax output (layer 1) symmetric?

1  Asked on January 5, 2022 by thepacker

Is there a good website where I can learn about Deep Deterministic Policy Gradient?

1  Asked on January 5, 2022 by huzaifah-shamim

Why can we perform graph convolution using the standard 2d convolution with $1 times Gamma$ kernels?

0  Asked on January 1, 2022

Anomaly Detection in distributed system using generated log file

1  Asked on December 30, 2021

How do big companies, like Facebook, model individuals and their interaction?

1  Asked on December 30, 2021

How to evaluate the performance of an autoencoder trained on image data?

1  Asked on December 30, 2021 by nim-py

Is there an optimal way to split the text into small parts when working with co-reference resolution?

0  Asked on December 30, 2021

Extending patch based image classification into image classification

0  Asked on December 30, 2021

How to properly optimize shared network between actor and critic?

1  Asked on December 27, 2021 by bestr

Which is a better form of regularization: lasso (L1) or ridge (L2)?

1  Asked on December 27, 2021 by jaeger6

What is meant by “arranging the final features of CNN in a grid” and how to do it?

0  Asked on December 27, 2021

How are training hyperparameters determined for large models?

1  Asked on December 27, 2021 by kao

How can I have the same input and output shape in an auto-encoder?

2  Asked on December 25, 2021 by vesko-vujovic

Which neural network should I use to distinguish between different types of defects?

0  Asked on December 25, 2021 by beinando

Can I think of the graph convolution operation as a regular 2D convolution for images?

0  Asked on December 25, 2021

How could I use machine learning to detect text and non-text regions in scanned documents?

2  Asked on December 22, 2021

Using convnet to classify language of text contained in images

1  Asked on December 20, 2021

Why does my “entropy generation” RNN do so badly?

1  Asked on December 18, 2021

Continuous state and continuous action Markov decision process time complexity estimate: backward induction VS policy gradient method (RL)

1  Asked on December 16, 2021 by leodongxu

What is meant by gene, chromosome, population in genetic algorithm in terms of feature selection?

2  Asked on December 16, 2021