TransWikia.com

Reinforcement Learning control with known dynamic equation

Data Science Asked on December 24, 2021

I know there is model-based reinforcement learning. But all the approaches assume an MDP.

If I want to do a feedback control of a system (i. e. control an inverted pendulum) it’s quite easy to find the nonlinear differential equation. Can I somehow feed this knowledge into RL-algorithms or are there ways to transform a dynamic system to an MDP?

One Answer

Reinforcement learning (RL) is completely based around MDPs, to the point where its definition is essentially "RL is a collection of algorithms that can learn about action choices within a MDP environment".

Outside of RL, you can work with control systems using differential equations more or less directly, and some are solvable analytically. In principle these direct solutions are more robust and require no learning, compared to RL. However, they usually rely on having simple to describe goals - typically static control of maintaining some important value (speed, position, temperature). This works fine for cruise control in cars, thermostats and industrial processes. It also works great for simple environments like inverted pendulum which has been solved without RL for decades.

The analytical non-RL approaches are far more trusted than trial-and-error statistical learners such as RL. But they are limited in terms of describing goals and complexity of environments. They start to fail at the level of the mountain car environment where the correct action may be to move further from a goal state before moving towards it. Mountain car can still of course be fully described by relatively simple differential equations.

If you have equations for a dynamic system in an environment with goals that are too complex to solve analytically, then you can easily convert to discrete MDP form: Use the equations to simulate the environment, choosing a discrete time step for action choice. There are also RL methods that can work with continuous control and variable time steps, which would also benefit from such a model. You could use such a model to learn in a simulated environment, or use it to help with planning in a real environment (or have a learning + planning algorithm in a simulated environment).

If you are starting with differential equations, then typically you would convert them into some non-differential form in order to apply them. If you can do this fully analytically - e.g. change something in the form $afrac{d^2x}{dt} + bfrac{dx}{dt} +c = 0$ into some $x = alpha e^{beta t}text{sin}(gamma t)$ - then this would make the most accurate predictions and simulations. Otherwise you can use some method of approximation in order to resolve the differential equation into something that predicts next state from current state.

Answered by Neil Slater on December 24, 2021

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP