TransWikia.com

How are training hyperparameters determined for large models?

Artificial Intelligence Asked by Kao on December 27, 2021

When training a relatively small DL model, which takes several hours to train, I typically start with some starting points from literature and then use a trial-and-error or grid-search approach to fine-tune the values of the hyper-parameters, in order to prevent overfitting and achieve sufficient performance.

However, it is not uncommon for large models to have training time measured in days or weeks [1], [2], [3].

How are hyperparameters determined in such cases?

One Answer

In general, it is definitely very computationally expensive, so an exhaustive search is not performed in practice. However, there are some recent approaches for determining whether the architecture is "fine" without training the neural network first - by looking at the covariance matrix after forwarding the data, for example, in a recent paper Neural Architecture Search without Training. However, such an approach is very limited.

Answered by spiridon_the_sun_rotator on December 27, 2021

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP