# How large should the corpus be to optimally retrain the GPT-2 model?

Artificial Intelligence Asked by Andreas Toresäter on September 20, 2020

I just started working with the GPT-2 models and want to retrain one on a pretty narrow topic, so I have problems finding training material.

How large should the corpus be to optimally retrain the GPT-2 model? And what is the bare minimum size? Should it simply be as large as possible or can it flip over and make the model worse in some way?

I am also not certain how many steps you should let the retraining run. I have been using 6000 steps when testing, and it seems not much happens after that, loss only moved from 0.2 to 0.18 last 1000 steps.

## Related Questions

### Can GANs be used to generate something other than images?

1  Asked on November 24, 2021

### What should the output of a neural network that needs to classify in an unsupervised fashion XOR data be?

1  Asked on November 20, 2021

### Choosing a policy improvement algorithm for a continuing problem with continuous action and state-space

1  Asked on November 20, 2021

### Why is the policy loss the mean of $-Q(s, mu(s))$ in the DDPG algorithm?

1  Asked on November 17, 2021 by dhanush-giriyan

### Are tabular reinforcement learning methods obsolete (or getting obsolete)?

1  Asked on November 12, 2021

### How do I test an LSTM-based reinforcement learning model using any Atari games in OpenAI gym?

1  Asked on November 10, 2021

### How does the target network in double DQNs find the maximum Q value for each action?

1  Asked on November 7, 2021

### Understanding the loss function in deep Q-learning

2  Asked on November 4, 2021

### Is a reward given at every step or only given when the RL agent fails or succeeds?

1  Asked on November 4, 2021

### Ways to keep up with the latest developments in Machine Learning and AI?

0  Asked on November 4, 2021 by tinu

### What is the expectation of an empirical model in model based RL?

1  Asked on November 4, 2021 by ijuneja

### How can I change observation states’ values in OpenAI gym’s cartpole environment?

1  Asked on August 24, 2021 by kashan

### What does the term $|mathcal{A}(s)|$ mean in the $epsilon$-greedy policy?

1  Asked on August 24, 2021 by metrician

### Do the order of the features ie channel matter for a 1d convolutional network?

1  Asked on August 24, 2021 by user289602

### What is convergence analysis, and why is it needed in reinforcement learning?

1  Asked on August 24, 2021 by daniel-koh

### Correct dimensionality of parameter vector for solving an MRP with linear function approximation?

0  Asked on August 24, 2021 by soitgoes

### How can I convert a simple CLI RPG to a compatible environment for training an RL agent via stable-baselines?

0  Asked on August 24, 2021 by seunosiko

### What is the amount of test data needed to evaluate a CNN?

0  Asked on August 24, 2021 by user38639

### What is the Turing test?

2  Asked on August 24, 2021

Get help from others!