I’m working on a continuous state / continuous action controller. It shall control a certain roll angle of an aircraft by issuing the correct aileron commands (in $[-1, 1]$).
To this end, I use a neural network and the DDPG algorithm, which shows promising results after about 20 minutes of training.
I stripped down the presented state to the model to only the roll angle and the angular velocity, so that the neural network is not overwhelmed by state inputs.
So it’s a 2 input / 1 output model to perform the control task.
In test runs, it looks mostly good, but sometimes, the controller starts thrashing, i.e. it outputs flittering commands, like in a very fast bangbang-controlm which causes a rapid movement of the elevator.
Even though this behavior kind of maintains the desired target value, this behavior is absolutely undesirable. Instead, it should keep the output smooth. So far, I was not able to detect any special disturbance that starts this behavior. Yet it comes out of the blue.
Does anybody have an idea or a hint (maybe a paper reference) on how to incorporate some element (maybe reward shaping during the training) to avoid such behavior? How to avoid rapid actuator movements in favor of smooth movements?
I tried to include the last action in the presented state and add a punishment component in my reward, but this did not really help. So obviously, I do something wrong.
After some research on the subject, I found a possible solution to my problem of high frequency oscillations in continuous control using DDPG:
I added a reward component based on the actuator movement, i. e. the delta of actions from one step to the next.
Excessive action changes are punished now and this could mitigate the tendency to oscillate. The solution is nnot really perfect, but it works for the moment.
This finding is detailed out in the "Reward Engineering" section of my master's thesis. Please have a look into https://github.com/opt12/Markov-Pilot/tree/master/thesis
I'll be glad to get feedback on it. And I'll be glad to hear better solutions than adding a delta-punishment.
Correct answer by opt12 on December 28, 2020
0 Asked on February 8, 2021
1 Asked on February 7, 2021 by vaibhav-thakkar
1 Asked on February 5, 2021 by gideon
1 Asked on February 4, 2021 by gokul
0 Asked on February 3, 2021 by maciek-woniak
1 Asked on February 2, 2021 by a-is-for-ambition
1 Asked on January 28, 2021 by seewoo-lee
1 Asked on January 28, 2021 by sergiu-ionescu
0 Asked on January 27, 2021 by blue-sky
0 Asked on January 23, 2021 by manish-kausik-hari-baskar
0 Asked on January 22, 2021
0 Asked on January 22, 2021 by ddaedalus
2 Asked on January 21, 2021 by onza
3 Asked on January 20, 2021 by andreas-storvik-strauman
1 Asked on January 17, 2021
3 Asked on January 14, 2021 by curious-g
0 Asked on January 10, 2021 by toby
1 Asked on January 6, 2021 by dua-fatima
1 Asked on January 1, 2021 by mark-mark
Get help from others!