Full Stack Deep Learning
  • Full Stack Deep Learning
  • Course Content
    • Setting up Machine Learning Projects
      • Overview
      • Lifecycle
      • Prioritizing
      • Archetypes
      • Metrics
      • Baselines
    • Infrastructure and Tooling
      • Overview
      • Software Engineering
      • Computing and GPUs
      • Resource Management
      • Frameworks and Distributed Training
      • Experiment Management
      • Hyperparameter Tuning
      • All-in-one Solutions
    • Data Management
      • Overview
      • Sources
      • Labeling
      • Storage
      • Versioning
      • Processing
    • Machine Learning Teams
      • Overview
      • Roles
      • Team Structure
      • Managing Projects
      • Hiring
    • Training and Debugging
      • Overview
      • Start Simple
      • Debug
      • Evaluate
      • Improve
      • Tune
      • Conclusion
    • Testing and Deployment
      • Project Structure
      • ML Test Score
      • CI / Testing
      • Docker
      • Web Deployment
      • Monitoring
      • Hardware/Mobile
    • Research Areas
    • Labs
    • Where to go next
  • Guest Lectures
    • Xavier Amatriain (Curai)
    • Chip Huyen (Snorkel)
    • Lukas Biewald (Weights & Biases)
    • Jeremy Howard (Fast.ai)
    • Richard Socher (Salesforce)
    • Raquel Urtasun (Uber ATG)
    • Yangqing Jia (Alibaba)
    • Andrej Karpathy (Tesla)
    • Jai Ranganathan (KeepTruckin)
    • Franziska Bell (Toyota Research)
  • Corporate Training and Certification
    • Corporate Training
    • Certification
Powered by GitBook
On this page

Was this helpful?

  1. Course Content
  2. Training and Debugging

Tune

How to tune deep learning models?

PreviousImproveNextConclusion

Last updated 5 years ago

Was this helpful?

Summary

  • Choosing which hyper-parameters to optimize is not an easy task since some are more sensitive than others and are dependent upon the choice of model.

    • Low sensitivity: Optimizer, batch size, non-linearity.

    • Medium sensitivity: weight initialization, model depth, layer parameters, weight of regularization.

    • High sensitivity: learning rate, annealing schedule, loss function, layer size.

  • Method 1 is manual optimization:

    • For a skilled practitioner, this may require the least amount of computation to get good results.

    • However, the method is time-consuming and requires a detailed understanding of the algorithm.

  • Method 2 is grid search:

    • Grid search is super simple to implement and can produce good results.

    • Unfortunately, it’s not very efficient since we need to train the model on all cross-combinations of the hyper-parameters. It also requires prior knowledge about the parameters to get good results.

  • Method 3 is random search:

    • Random search is also easy to implement and often produces better results than grid search.

    • But it is not very interpretable and may also require prior knowledge about the parameters to get good results.

  • Method 4 is coarse-to-fine search:

    • This strategy helps you narrow in only on very high performing hyper-parameters and is a common practice in the industry.

    • The only drawback is that it is somewhat a manual process.

  • Method 5 is Bayesian optimization search:

    • Bayesian optimization is generally the most efficient hands-off way to choose hyper-parameters.

    • But it’s difficult to implement from scratch and can be hard to integrate with off-the-shelf tools.

Tune - Troubleshooting