Is a reward given at every step or only given when the RL agent fails or succeeds?

Question

In reinforcement learning, an agent can receive a positive reward for correct actions and a negative reward for wrong actions, but does the agent also receive rewards for every other step/action?

Neil Slater · Answer

In reinforcement learning (RL), an immediate reward value must be returned after each action, along with the next state. This value can be zero though, which will have no direct impact on optimality or setting goals.
Unless you are modifying the reward scheme to try and make an environment easier to learn (called reward shaping), then you should be aiming for a "natural" reward scheme. That means granting reward based directly on the goals of the agent.
Common reward schemes might include:

+1 for winning a game or reaching a goal state granted only at the end of an episode, whilst all other steps have a reward of zero. You might also see 0 for a draw and -1 for losing a game.

-1 per time step, when the goal is to solve a problem in minimum time steps.

a reward proportional to the amount of something that the agent produces - e.g. energy, money, chemical product, granted on any stop where this product is obtained, zero otherwise. Potentially a negative reward based on something else that the agent consumes in order to produce the product, e.g. fuel.

Is a reward given at every step or only given when the RL agent fails or succeeds?

One Answer

Add your own answers!

Ask a Question