How can I change observation states' values in OpenAI gym's cartpole environment?

Artificial Intelligence Asked by Kashan on August 24, 2021

I am learning with the OpenAI gym’s cart pole environment.

I want to make the observation states discrete (with small stepsize) and for that purpose, I need to change two of the observations from [$$-infty, infty$$] to some finite upper and lower limits. (By the way, these states are velocity and pole velocity at the tip).

How can I change these limits in the actual gym’s environment?
Any other suggestions are also welcome.

One Answer

I don't recommend changing the rules of the environment.

What you could do:

Perform a method called bucketing i.e. take a value from a continuous state space see which discrete bucket it should go into and then let your agent use the bucket number as the observation.

e.g. Say I do have a continuous state space with one variable in range $$[-infty,infty]$$

The buckets can be as follows:

0). x < -1000

1). -1000 $$le$$ x $$<$$ -500

2). -500 $$le$$ x $$<$$ -100

3). -100 $$le$$ x $$<$$ -50

4). -50 $$le$$ x $$<$$ 0

5). 0 $$le$$ x $$<$$ 50

6). 50 $$le$$ x $$<$$ 100

7). 100 $$le$$ x $$<$$ 500

8). 500 $$le$$ x $$<$$ 1000

9). x > 1000

Therefore in this example scenario there are 9 buckets. Hence, the observations can be in range [0, 9] discretely.

Answered by rert588 on August 24, 2021

Related Questions

Can GANs be used to generate something other than images?

1  Asked on November 24, 2021

What should the output of a neural network that needs to classify in an unsupervised fashion XOR data be?

1  Asked on November 20, 2021

Choosing a policy improvement algorithm for a continuing problem with continuous action and state-space

1  Asked on November 20, 2021

Why is the policy loss the mean of $-Q(s, mu(s))$ in the DDPG algorithm?

1  Asked on November 17, 2021 by dhanush-giriyan

Are tabular reinforcement learning methods obsolete (or getting obsolete)?

1  Asked on November 12, 2021

How do I test an LSTM-based reinforcement learning model using any Atari games in OpenAI gym?

1  Asked on November 10, 2021

How does the target network in double DQNs find the maximum Q value for each action?

1  Asked on November 7, 2021

Understanding the loss function in deep Q-learning

2  Asked on November 4, 2021

Is a reward given at every step or only given when the RL agent fails or succeeds?

1  Asked on November 4, 2021

Ways to keep up with the latest developments in Machine Learning and AI?

0  Asked on November 4, 2021 by tinu

What is the expectation of an empirical model in model based RL?

1  Asked on November 4, 2021 by ijuneja

How can I change observation states’ values in OpenAI gym’s cartpole environment?

1  Asked on August 24, 2021 by kashan

Why can’t we train neural networks in a peer-to-peer manner?

1  Asked on August 24, 2021 by ram-bharadwaj

What does the term $|mathcal{A}(s)|$ mean in the $epsilon$-greedy policy?

1  Asked on August 24, 2021 by metrician

Do the order of the features ie channel matter for a 1d convolutional network?

1  Asked on August 24, 2021 by user289602

What is convergence analysis, and why is it needed in reinforcement learning?

1  Asked on August 24, 2021 by daniel-koh

Correct dimensionality of parameter vector for solving an MRP with linear function approximation?

0  Asked on August 24, 2021 by soitgoes

How can I convert a simple CLI RPG to a compatible environment for training an RL agent via stable-baselines?

0  Asked on August 24, 2021 by seunosiko

What is the amount of test data needed to evaluate a CNN?

0  Asked on August 24, 2021 by user38639

What is the Turing test?

2  Asked on August 24, 2021

Ask a Question

Get help from others!

© 2022 AnswerBun.com. All rights reserved. Sites we Love: PCI Database, MenuIva, UKBizDB, Menu Kuliner, Sharing RPP