# How to avoid rapid actuator movements in favor of smooth movements in a continuous space and action space problem?

I’m working on a continuous state / continuous action controller. It shall control a certain roll angle of an aircraft by issuing the correct aileron commands (in $$[-1, 1]$$).

To this end, I use a neural network and the DDPG algorithm, which shows promising results after about 20 minutes of training.

I stripped down the presented state to the model to only the roll angle and the angular velocity, so that the neural network is not overwhelmed by state inputs.

So it’s a 2 input / 1 output model to perform the control task.

In test runs, it looks mostly good, but sometimes, the controller starts thrashing, i.e. it outputs flittering commands, like in a very fast bangbang-controlm which causes a rapid movement of the elevator.

Even though this behavior kind of maintains the desired target value, this behavior is absolutely undesirable. Instead, it should keep the output smooth. So far, I was not able to detect any special disturbance that starts this behavior. Yet it comes out of the blue.

Does anybody have an idea or a hint (maybe a paper reference) on how to incorporate some element (maybe reward shaping during the training) to avoid such behavior? How to avoid rapid actuator movements in favor of smooth movements?

I tried to include the last action in the presented state and add a punishment component in my reward, but this did not really help. So obviously, I do something wrong.

Artificial Intelligence Asked by opt12 on December 28, 2020

After some research on the subject, I found a possible solution to my problem of high frequency oscillations in continuous control using DDPG:

I added a reward component based on the actuator movement, i. e. the delta of actions from one step to the next.

Excessive action changes are punished now and this could mitigate the tendency to oscillate. The solution is nnot really perfect, but it works for the moment.

This finding is detailed out in the "Reward Engineering" section of my master's thesis. Please have a look into https://github.com/opt12/Markov-Pilot/tree/master/thesis

I'll be glad to get feedback on it. And I'll be glad to hear better solutions than adding a delta-punishment.

Regards, Felix

Correct answer by opt12 on December 28, 2020

## Related Questions

### Is my pseudocode titled “Monte Carlo Exploring Starts (with model)” correct?

0  Asked on February 8, 2021

### Measuring novel configuration of points

1  Asked on February 7, 2021 by vaibhav-thakkar

### Generation of ‘new log probabilities’ in continuous action space PPO

1  Asked on February 5, 2021 by gideon

### What is the order of execution of steps in back-propagation algorithm in a neural network?

1  Asked on February 4, 2021 by gokul

### Why do we add additional axis in CNN autoencoder while denoising?

0  Asked on February 3, 2021 by maciek-woniak

### Why is the mean used to compute the expectation in the GAN loss?

1  Asked on February 2, 2021 by a-is-for-ambition

### Computation of initial adjoint for NODE

1  Asked on January 28, 2021 by seewoo-lee

### Advantage Actor Critic model implementation with Tensorflowjs

1  Asked on January 28, 2021 by sergiu-ionescu

### How to frame this problem using RL?

0  Asked on January 27, 2021 by blue-sky

### Train 3D object detection model for custom object

0  Asked on January 22, 2021

### Is there Binary Zero-Shot Learning with no defined prototypes for the unseen class?

0  Asked on January 22, 2021 by ddaedalus

### How can I generate natural language sentences given logical structures that contain the subject, verb and target?

2  Asked on January 21, 2021 by onza

### How are Artificial Neural Networks and the Biological Neural Networks similar and different?

3  Asked on January 20, 2021 by andreas-storvik-strauman

### In GradCAM, why is activation strength considered an indicator of relevant regions?

1  Asked on January 17, 2021

### Is there any artificially intelligent system that really mimics human intelligence?

3  Asked on January 14, 2021 by curious-g

### Why scaling down the parameter many times during training will help the learning speed be the same for all weights in Progressive GAN?

0  Asked on January 10, 2021 by toby

### How could we solve the TSP using a hill-climbing approach?

1  Asked on January 6, 2021 by dua-fatima

### What’s the purpose of layers without biases?

1  Asked on January 1, 2021 by mark-mark

### How to avoid rapid actuator movements in favor of smooth movements in a continuous space and action space problem?

1  Asked on December 28, 2020 by opt12