# Correct dimensionality of parameter vector for solving an MRP with linear function approximation?

Artificial Intelligence Asked by soitgoes on August 24, 2021

I’m in the process of trying to learn more about RL by shadowing a course offered collaboratively by UCL and DeepMind that has been made available to the public. I’m most of the way through the course, which for auditors consists of a Youtube playlist, copies of the Jupyter notebooks used for homework assigments (thanks to some former students making them public on Github), and reading through Sutton and Barto’s wonderful book Reinforcement Learning: An Introduction (2nd edition).

I’ve gone a little more than half of the book and corresponding course material at this point, thankfully with the aid of public solutions for the homework assignments and textbook exercises which have allowed me to see which parts of my own work that I’ve done incorrectly. Unfortunately, I’ve been unable to find such a resource for the last homework assignment offered and so I’m hoping one of the many capable people here might be able to explain parts of the following question to me.

We are given a simple Markov reward process consisting of two states and with a reward of zero everywhere. When we are in state $$s_{0}$$, we always transition to $$s_{1}$$. If we are in state $$s_{1}$$, there is a probability $$p$$ (which is set to 0.1 by default) of terminating, after which the next episode starts in $$s_{0}$$ again. With a probability of $$1 – p$$, we transition from $$s_{1}$$ back to itself again. The discount is $$gamma = 1$$ on non-terminal steps.

Instead of a tabular representation, consider a single feature $$phi$$, which takes the values $$phi(s_0) = 1$$ and $$phi(s_1) = 4$$. Now consider using linear function approximation, where we learn a value $$theta$$ such that $$v_{theta}(s) = theta times phi(s) approx v(s)$$, where $$v(s)$$ is the true value of state $$s$$.

Suppose $$theta_{0} = 1$$, and suppose we update this parameter with TD(0) with a step size of $$alpha = 0.1$$. What is the expected value of $$mathbb{E}[ theta_T ]$$ if we step through the MRP until it terminates after the first episode, as a function of $$p$$? (Note that $$T$$ is random.)

My real point of confusion surrounds $$theta_{0}$$ being given as 1. My understanding was that the dimensionality of the parameter vector should be equal to that of the feature vector, which I’ve understood as being (1, 4) and thus two-dimensional. I also don’t grok the idea of evaluating $$mathbb{E}[ theta_T ]$$ should $$theta$$ be a scalar (as an aside I attempted to simply brute-force simulate the first episode using a scalar parameter of 1 and, unless I made errors, found the value of $$theta$$ to not depend on $$p$$ whatsoever). If $$theta$$ is two-dimensional, would that be represented as (1, 0), (0, 1), or (1, 1)?

Neither the 1-d or 2-d options make intuitive sense to me so I hope there’s something clear and obvious that someone might be able to point out. For more context or should someone just be interested in the assignment, here is a link to the Jupyter notebook:

## Related Questions

### Is my pseudocode titled “Monte Carlo Exploring Starts (with model)” correct?

0  Asked on February 8, 2021

### Measuring novel configuration of points

1  Asked on February 7, 2021 by vaibhav-thakkar

### Generation of ‘new log probabilities’ in continuous action space PPO

1  Asked on February 5, 2021 by gideon

### What is the order of execution of steps in back-propagation algorithm in a neural network?

1  Asked on February 4, 2021 by gokul

### Why do we add additional axis in CNN autoencoder while denoising?

0  Asked on February 3, 2021 by maciek-woniak

### Why is the mean used to compute the expectation in the GAN loss?

1  Asked on February 2, 2021 by a-is-for-ambition

### Computation of initial adjoint for NODE

1  Asked on January 28, 2021 by seewoo-lee

### Advantage Actor Critic model implementation with Tensorflowjs

1  Asked on January 28, 2021 by sergiu-ionescu

### How to frame this problem using RL?

0  Asked on January 27, 2021 by blue-sky

### Train 3D object detection model for custom object

0  Asked on January 22, 2021

### Is there Binary Zero-Shot Learning with no defined prototypes for the unseen class?

0  Asked on January 22, 2021 by ddaedalus

### How can I generate natural language sentences given logical structures that contain the subject, verb and target?

2  Asked on January 21, 2021 by onza

### How are Artificial Neural Networks and the Biological Neural Networks similar and different?

3  Asked on January 20, 2021 by andreas-storvik-strauman

### In GradCAM, why is activation strength considered an indicator of relevant regions?

1  Asked on January 17, 2021

### Is there any artificially intelligent system that really mimics human intelligence?

3  Asked on January 14, 2021 by curious-g

### Why scaling down the parameter many times during training will help the learning speed be the same for all weights in Progressive GAN?

0  Asked on January 10, 2021 by toby

### How could we solve the TSP using a hill-climbing approach?

1  Asked on January 6, 2021 by dua-fatima

### What’s the purpose of layers without biases?

1  Asked on January 1, 2021 by mark-mark

### How to avoid rapid actuator movements in favor of smooth movements in a continuous space and action space problem?

1  Asked on December 28, 2020 by opt12