Full Stack Deep Learning
  • Full Stack Deep Learning
  • Course Content
    • Setting up Machine Learning Projects
      • Overview
      • Lifecycle
      • Prioritizing
      • Archetypes
      • Metrics
      • Baselines
    • Infrastructure and Tooling
      • Overview
      • Software Engineering
      • Computing and GPUs
      • Resource Management
      • Frameworks and Distributed Training
      • Experiment Management
      • Hyperparameter Tuning
      • All-in-one Solutions
    • Data Management
      • Overview
      • Sources
      • Labeling
      • Storage
      • Versioning
      • Processing
    • Machine Learning Teams
      • Overview
      • Roles
      • Team Structure
      • Managing Projects
      • Hiring
    • Training and Debugging
      • Overview
      • Start Simple
      • Debug
      • Evaluate
      • Improve
      • Tune
      • Conclusion
    • Testing and Deployment
      • Project Structure
      • ML Test Score
      • CI / Testing
      • Docker
      • Web Deployment
      • Monitoring
      • Hardware/Mobile
    • Research Areas
    • Labs
    • Where to go next
  • Guest Lectures
    • Xavier Amatriain (Curai)
    • Chip Huyen (Snorkel)
    • Lukas Biewald (Weights & Biases)
    • Jeremy Howard (Fast.ai)
    • Richard Socher (Salesforce)
    • Raquel Urtasun (Uber ATG)
    • Yangqing Jia (Alibaba)
    • Andrej Karpathy (Tesla)
    • Jai Ranganathan (KeepTruckin)
    • Franziska Bell (Toyota Research)
  • Corporate Training and Certification
    • Corporate Training
    • Certification
Powered by GitBook
On this page

Was this helpful?

  1. Course Content
  2. Infrastructure and Tooling

Computing and GPUs

How to choose appropriate hardware for your compute needs? Should you compute in the cloud or using your own GPUs?

PreviousSoftware EngineeringNextResource Management

Last updated 4 years ago

Was this helpful?

Summary

  • If you go with the GPU round, there are a lot of NVIDIA cards to choose from (Kepler, Maxwell, Pascal, Volta, Turing).

  • If you go with a cloud provider, Amazon Web Services and Google Cloud Platform are the heavyweights, while startups such as Paperspace and Lambda Labs are also viable options.

  • If you work solo or in a startup, you should build or buy a 4x recent-architecture PC for model development. For model training, if you run many experiments, you can either buy shared server machines or use cloud instances.

  • If you work in a large company, you are more likely to rely on cloud instances for both model development and model training, as they provide proper provisioning and infrastructure to handle failures.

Computing and GPUS - Infrastructure and Tooling