# Should I use the discounted average reward as objective in a finite-horizon problem?

Artificial Intelligence Asked by lll on December 18, 2020

I am new to reinforcement learning, but, for a finite horizon application problem, I am considering using the average reward instead of the sum of rewards as the objective. Specifically, there are a total of $$T$$ maximally possible time steps (e.g., the usage rate of an app in each time-step), in each time-step, the reward may be 0 or 1. The goal is to maximize the daily average usage rate.

Episode length ($$T$$) is maximally 10. $$T$$ is the maximum time window the product can observe about a user’s behavior of the chosen data. There is an indicator value in the data indicating whether an episode terminates. From the data, it is offline learning, so in each episode, $$T$$ is given in the data. As long as an episode doesn’t terminate, there is a reward of $${0, 1}$$ in each time-step.

I heard if I use an average reward for the finite horizon, the optimal policy is no longer a stationary policy, and optimal $$Q$$ function depends on time. I am wondering why this is the case.

I see normally, the objective is defined maximizing

$$sum_t^T gamma^t r_t$$

And I am considering two types of average reward definition.

1. $$1/T(sum^?_{t=0}gamma^t r_t)$$, $$T$$ varies is in each episode.

2. $$1/(T-t)sum^T_{i=t-1}gamma^i r_i$$

## Related Questions

### Can GANs be used to generate something other than images?

1  Asked on November 24, 2021

### What should the output of a neural network that needs to classify in an unsupervised fashion XOR data be?

1  Asked on November 20, 2021

### Choosing a policy improvement algorithm for a continuing problem with continuous action and state-space

1  Asked on November 20, 2021

### Why is the policy loss the mean of $-Q(s, mu(s))$ in the DDPG algorithm?

1  Asked on November 17, 2021 by dhanush-giriyan

### Are tabular reinforcement learning methods obsolete (or getting obsolete)?

1  Asked on November 12, 2021

### How do I test an LSTM-based reinforcement learning model using any Atari games in OpenAI gym?

1  Asked on November 10, 2021

### How does the target network in double DQNs find the maximum Q value for each action?

1  Asked on November 7, 2021

### Understanding the loss function in deep Q-learning

2  Asked on November 4, 2021

### Is a reward given at every step or only given when the RL agent fails or succeeds?

1  Asked on November 4, 2021

### Ways to keep up with the latest developments in Machine Learning and AI?

0  Asked on November 4, 2021 by tinu

### What is the expectation of an empirical model in model based RL?

1  Asked on November 4, 2021 by ijuneja

### How can I change observation states’ values in OpenAI gym’s cartpole environment?

1  Asked on August 24, 2021 by kashan

### What does the term $|mathcal{A}(s)|$ mean in the $epsilon$-greedy policy?

1  Asked on August 24, 2021 by metrician

### Do the order of the features ie channel matter for a 1d convolutional network?

1  Asked on August 24, 2021 by user289602

### What is convergence analysis, and why is it needed in reinforcement learning?

1  Asked on August 24, 2021 by daniel-koh

### Correct dimensionality of parameter vector for solving an MRP with linear function approximation?

0  Asked on August 24, 2021 by soitgoes

### How can I convert a simple CLI RPG to a compatible environment for training an RL agent via stable-baselines?

0  Asked on August 24, 2021 by seunosiko

### What is the amount of test data needed to evaluate a CNN?

0  Asked on August 24, 2021 by user38639

### What is the Turing test?

2  Asked on August 24, 2021