# Choosing a policy improvement algorithm for a continuing problem with continuous action and state-space

Artificial Intelligence Asked on November 20, 2021

I’m trying to decide which policy improvement algorithm to use in the context of my problem. But let me emerge you into the problem

Problem

I want to move a set of points in a 3D space. Depending on how the points move, the environment gives a positive or negative reward. Further, the environment does not split up into episodes, so it is a continuing problem. The state space is high-dimensional (a lot of states are possible) and many states can be similar (so state aliasing can appear), also states are continuous. The problem is dense in rewards, so for every transition, there will be a negative or positive reward, depending on the previous state.

A state is represented as a vector with dimension N (initially it will be something like ~100, but in the future, I want to work with vectors up to 1000).

In the case of action, it is described by a matrix 3xN, where N is the same as in the case of the state. The first dimension comes from the fact, that action is 3D displacement.

What I have done so far

Since actions are continuous, I have narrowed down my search to policy gradient methods. Further, I researched methods, that work with continuous state spaces. I found a deep deterministic policy gradient (DDPG) and the Proximal Policy Gradient (PPO) would fit here. Theoretically, they should work but I’m unsure and any advice would be gold here.

Questions

Would those algorithms be suitable for the problem (PPO or DDPG)?
There are other policy improvement algorithms that would work here or a family of policy improvement algorithms?

I'm using my own implementation of A2C (Advantage Actor Critic) in an industrial application based on Markov Process (present state alone provides sufficient knowledge to make an optimal decision). It's simple and versatile, its performance proven in many different applications. The results so far have been promising.

One of my colleagues had issues with solving a simple task of mapping images to coordinates with OpenAI's Stable Baselines implementations of PPO and TRPO. Hence I'm biased against this framework.

My suggestion is to try the simplest model and if that doesn't satisfy your expectations for performance, then try something fancier. Once you've made a pipeline for learning, switching to a different algorithm is relatively time inexpensive.

Here is a list of algorithms for continuous action and state space from the wikipedia article about RL:

Answered by conscious_process on November 20, 2021

## Related Questions

### Is my pseudocode titled “Monte Carlo Exploring Starts (with model)” correct?

0  Asked on February 8, 2021

### Measuring novel configuration of points

1  Asked on February 7, 2021 by vaibhav-thakkar

### Generation of ‘new log probabilities’ in continuous action space PPO

1  Asked on February 5, 2021 by gideon

### What is the order of execution of steps in back-propagation algorithm in a neural network?

1  Asked on February 4, 2021 by gokul

### Why do we add additional axis in CNN autoencoder while denoising?

0  Asked on February 3, 2021 by maciek-woniak

### Why is the mean used to compute the expectation in the GAN loss?

1  Asked on February 2, 2021 by a-is-for-ambition

### Computation of initial adjoint for NODE

1  Asked on January 28, 2021 by seewoo-lee

### Advantage Actor Critic model implementation with Tensorflowjs

1  Asked on January 28, 2021 by sergiu-ionescu

### How to frame this problem using RL?

0  Asked on January 27, 2021 by blue-sky

### Train 3D object detection model for custom object

0  Asked on January 22, 2021

### Is there Binary Zero-Shot Learning with no defined prototypes for the unseen class?

0  Asked on January 22, 2021 by ddaedalus

### How can I generate natural language sentences given logical structures that contain the subject, verb and target?

2  Asked on January 21, 2021 by onza

### How are Artificial Neural Networks and the Biological Neural Networks similar and different?

3  Asked on January 20, 2021 by andreas-storvik-strauman

### In GradCAM, why is activation strength considered an indicator of relevant regions?

1  Asked on January 17, 2021

### Is there any artificially intelligent system that really mimics human intelligence?

3  Asked on January 14, 2021 by curious-g

### Why scaling down the parameter many times during training will help the learning speed be the same for all weights in Progressive GAN?

0  Asked on January 10, 2021 by toby

### How could we solve the TSP using a hill-climbing approach?

1  Asked on January 6, 2021 by dua-fatima

### What’s the purpose of layers without biases?

1  Asked on January 1, 2021 by mark-mark

### How to avoid rapid actuator movements in favor of smooth movements in a continuous space and action space problem?

1  Asked on December 28, 2020 by opt12

Get help from others!