TransWikia.com

Understanding action space in stable baselines

Data Science Asked on April 16, 2021

I was trying to write reinforcement learning agent using stable-baselines3 library. The agent(abservations) method should return action. I went through different models API (like PPO) and they do not really allow us to specify action space. Instead action space is specified in environment.

This notebook says:

The type of action to use (discrete/continuous) will be automatically deduced from the environment action space.

So, it seems that "model" deduce action space from environment.

Q1. But exactly how?

Q2. Also how my agent(observations) method should return action? By returning action returned by model.predict()?

2 Answers

I guess I got answer to Q1. Sharing my understanding below:

  • PPO extends OnPolicyAlgorithm
  • PPO calls super(..,env,..), that is OnPolicyAlgorithm(..,env,..)
  • OnPolicyAlgorithm extends BaseAlgorithm
  • OnPolicyAlgorithm calls super(..,env,..), that is BaseAlgorithm(..,env,..)
  • BaseAlgorithm seems to obtain action_space from env

Still I am unable to get answer Q2.

Answered by Rnj on April 16, 2021

In short, in RL we develop an agent that interacts with an environment by generating actions that have an effect to the environment. So the agent perceives the state of the environment e.g. by using a Neural Network and outputs the relevant action.

For your first question take a look at the Openai pendulum environment.

For your second question yes model.predict(observation) does exactly this. If you think that every model is a NN with input observations then the output will be an action (sometimes implicitly as you might get expected values as outputs and use them to define your action selection policy).

Answered by Constantinos on April 16, 2021

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP