# Handling a Large Discrete Action Space in Deep Q Learning

Artificial Intelligence Asked by FoxCharles on September 9, 2020

I am attempting to solve a timetabling problem using deep Q learning. It could be thought of as a resource allocation problem to obtain some certificate of ‘optimality’. However, how to define and access the action space is alluding me. Any help, thoughts, or direction towards the literature would be appreciated. Thanks!

The problem is entirely deterministic, the pair of the current state and action is isomorphic to the resulting state. The Q network is therefore being set up to approximate a Q value (a scalar) for the resulting state, i.e. for the current state and proposed action.

I have so far assumed that the action space should be randomly sampled during training to generate some approximation of the Q table. This seems highly inefficient.

I am open to reinterpretations of the action space. The problem involves a set of n individuals and at any given state a maximum of b can be ‘active’ and, of the remaining ‘inactive’ individuals, f can be made ‘active’ by an action. An action will need to involve making some reallocation to active individuals made up of those who are already active and the other f available people.

To give you a sense over the numbers that I will ultimately use, $$n=17, b=7$$, and $$f$$ will hover somewhere around 7-10 (but depends on the allocations). At first this sounds tractable, but a (very) rough approximation of the cardinality of the set of actions is 17 choose 7 = 19448.

Does anyone know a more efficient way to encode this action space? If not, is there a more sensible way to sample it (as is my current plan) than uniformly extracting actions from the space? Also when sampling the space is it valid to enforce some cap on the number of samples drawn (say 500). Please feel free to ask for further clarification.

## Related Questions

### Is A* with an admissible but inconsistent heuristic optimal?

1  Asked on August 24, 2021 by harry-stuart

### Is there a way to get landmark features automatically learned by a neural network?

1  Asked on August 24, 2021 by user784446

1  Asked on August 24, 2021 by nathan-b

### NEAT can’t solve XOR completely

0  Asked on August 24, 2021 by creepsy

### Why is GPT-3 such a game changer?

1  Asked on August 24, 2021 by parzival

### How can the FCNN reduce the dimensions of the input from $1048 times 100$ to $523 times 100$ with max-pooling?

0  Asked on August 24, 2021

### Are there examples of agents that use a more modest number of parameters on Pendulum (or similar environments)?

1  Asked on August 24, 2021

### How can one be sure that a particular neural network architecture would work?

0  Asked on August 24, 2021 by naveen-reddy-marthala

### How do we make our outputs to have the same size as the true mask?

1  Asked on August 24, 2021 by ravi-teja

### Is it common to have extreme policy’s probabilities?

1  Asked on August 24, 2021 by curiouscat22

### How AlphaGo Zero is learning from $pi_t$ when $z_t = -1$?

1  Asked on August 24, 2021

### Can we use imitation learning for on-policy algorithms?

0  Asked on February 27, 2021 by khush-agrawal

### Why am I getting a difference between training accuracy and accuracy calculated with Keras’ predict_classes on a subset of the training data?

1  Asked on February 23, 2021 by saha

### Can transformer be better than RNN for online speech recognition?

1  Asked on February 23, 2021

### In the case of invalid actions, which output probability matrix should we use in back-propagation?

1  Asked on February 20, 2021 by guineu

### Multi class text classification when having only one sample for classes

1  Asked on February 19, 2021 by fara

### How to identify segment/object that is anomaly using computer vision

0  Asked on February 14, 2021 by tyler-h

### What sort of game problems can neural networks trained/evolved with evolutionary algorithms solve, and how are they typically implemented?

3  Asked on February 13, 2021 by neomerarcana

### How are IOUs for ground truth boxes in YOLO calculated?

1  Asked on February 12, 2021 by nivter

### How to design a good evaluation function for a go-like game?

1  Asked on February 11, 2021 by nae