Tune
How to tune deep learning models?
Summary
- Choosing which hyper-parameters to optimize is not an easy task since some are more sensitive than others and are dependent upon the choice of model. - Low sensitivity: Optimizer, batch size, non-linearity. 
- Medium sensitivity: weight initialization, model depth, layer parameters, weight of regularization. 
- High sensitivity: learning rate, annealing schedule, loss function, layer size. 
 
- Method 1 is manual optimization: - For a skilled practitioner, this may require the least amount of computation to get good results. 
- However, the method is time-consuming and requires a detailed understanding of the algorithm. 
 
- Method 2 is grid search: - Grid search is super simple to implement and can produce good results. 
- Unfortunately, it’s not very efficient since we need to train the model on all cross-combinations of the hyper-parameters. It also requires prior knowledge about the parameters to get good results. 
 
- Method 3 is random search: - Random search is also easy to implement and often produces better results than grid search. 
- But it is not very interpretable and may also require prior knowledge about the parameters to get good results. 
 
- Method 4 is coarse-to-fine search: - This strategy helps you narrow in only on very high performing hyper-parameters and is a common practice in the industry. 
- The only drawback is that it is somewhat a manual process. 
 
- Method 5 is Bayesian optimization search: - Bayesian optimization is generally the most efficient hands-off way to choose hyper-parameters. 
- But it’s difficult to implement from scratch and can be hard to integrate with off-the-shelf tools. 
 
Last updated
Was this helpful?
