# Should I use the discounted average reward as objective in a finite-horizon problem?

Artificial Intelligence Asked by lll on December 18, 2020

I am new to reinforcement learning, but, for a finite horizon application problem, I am considering using the average reward instead of the sum of rewards as the objective. Specifically, there are a total of $$T$$ maximally possible time steps (e.g., the usage rate of an app in each time-step), in each time-step, the reward may be 0 or 1. The goal is to maximize the daily average usage rate.

Episode length ($$T$$) is maximally 10. $$T$$ is the maximum time window the product can observe about a user’s behavior of the chosen data. There is an indicator value in the data indicating whether an episode terminates. From the data, it is offline learning, so in each episode, $$T$$ is given in the data. As long as an episode doesn’t terminate, there is a reward of $${0, 1}$$ in each time-step.

I heard if I use an average reward for the finite horizon, the optimal policy is no longer a stationary policy, and optimal $$Q$$ function depends on time. I am wondering why this is the case.

I see normally, the objective is defined maximizing

$$sum_t^T gamma^t r_t$$

And I am considering two types of average reward definition.

1. $$1/T(sum^?_{t=0}gamma^t r_t)$$, $$T$$ varies is in each episode.

2. $$1/(T-t)sum^T_{i=t-1}gamma^i r_i$$

## Related Questions

### Should I use the discounted average reward as objective in a finite-horizon problem?

0  Asked on December 18, 2020 by lll

### What is the search depth of AlphaGo and AlphaGo Zero?

1  Asked on December 15, 2020

### Why does the error of my LSTM not decrease after 10 epochs?

1  Asked on December 13, 2020 by k-do

### Why is the completeness of UCS guaranteed only if the cost of every step exceeds some small positive constant?

1  Asked on December 9, 2020 by kais-hasan

### Is it good practice to save NLP Transformer based pre-trained models into file system in production environment

0  Asked on December 9, 2020 by murugesh

### Why is my Soft Actor-Critic’s policy and value function losses not converging?

1  Asked on December 7, 2020 by zahra

### Can you train Transformers sequentially?

1  Asked on December 5, 2020

### What is meant by “ground truth” in the context AI?

2  Asked on December 3, 2020 by mscott

### Utilizing continuous variables in concept learning algorithms

0  Asked on December 3, 2020 by edwin-carlsson

### Unix timestamps for Recurrent Neural Networks

1  Asked on December 2, 2020 by alena-volkova

### Is it possible to implement Asimov’s Three Laws of Robotics?

3  Asked on November 30, 2020 by mithical

### Is there any research on developing AGI systems based on meta-programming?

0  Asked on November 30, 2020 by dimer

### Extracting algebraic constraints from the input data

3  Asked on November 18, 2020 by user91411

### Any AI software to help finding funding for AI projects?

0  Asked on November 18, 2020 by basile-starynkevitch

### Understaning Bayesian Optimisation graph

1  Asked on November 7, 2020 by duttaa

### Does Algorithmic Mechanism Design come under the field of AI?

0  Asked on October 30, 2020 by kosmos

### How can I implement 2D CNN filter with channelwise-bound kernel weights?

1  Asked on October 12, 2020 by petsol

### Why L2 loss is more commonly used in Neural Networks than other loss functions?

1  Asked on September 27, 2020 by ali-khalili

### How to produce documents like factset blackline?

1  Asked on September 23, 2020 by user2946825

### How large should the corpus be to optimally retrain the GPT-2 model?

0  Asked on September 20, 2020 by andreas-torester