Full Stack Deep Learning
  • Full Stack Deep Learning
  • Course Content
    • Setting up Machine Learning Projects
      • Overview
      • Lifecycle
      • Prioritizing
      • Archetypes
      • Metrics
      • Baselines
    • Infrastructure and Tooling
      • Overview
      • Software Engineering
      • Computing and GPUs
      • Resource Management
      • Frameworks and Distributed Training
      • Experiment Management
      • Hyperparameter Tuning
      • All-in-one Solutions
    • Data Management
      • Overview
      • Sources
      • Labeling
      • Storage
      • Versioning
      • Processing
    • Machine Learning Teams
      • Overview
      • Roles
      • Team Structure
      • Managing Projects
      • Hiring
    • Training and Debugging
      • Overview
      • Start Simple
      • Debug
      • Evaluate
      • Improve
      • Tune
      • Conclusion
    • Testing and Deployment
      • Project Structure
      • ML Test Score
      • CI / Testing
      • Docker
      • Web Deployment
      • Monitoring
      • Hardware/Mobile
    • Research Areas
    • Labs
    • Where to go next
  • Guest Lectures
    • Xavier Amatriain (Curai)
    • Chip Huyen (Snorkel)
    • Lukas Biewald (Weights & Biases)
    • Jeremy Howard (Fast.ai)
    • Richard Socher (Salesforce)
    • Raquel Urtasun (Uber ATG)
    • Yangqing Jia (Alibaba)
    • Andrej Karpathy (Tesla)
    • Jai Ranganathan (KeepTruckin)
    • Franziska Bell (Toyota Research)
  • Corporate Training and Certification
    • Corporate Training
    • Certification
Powered by GitBook
On this page

Was this helpful?

  1. Course Content
  2. Infrastructure and Tooling

Overview

What are the components of a machine learning system?

PreviousInfrastructure and ToolingNextSoftware Engineering

Last updated 4 years ago

Was this helpful?

Summary

  • Google's seminal paper "Machine Learning: The High-Interest Credit Card of Technical Debt" states that if we look at the whole machine learning system, the actual modeling code is very small. There are a lot of other code around it that configure the system, extract the data/features, test the model performance, manage processes/resources, and serve/deploy the model.

  • The data component:

    • Data Storage - How to store the data?

    • Data Workflows - How to process the data?

    • Data Labeling - How to label the data?

    • Data Versioning - How to version the data?

  • The development component:

    • Software Engineering - How to choose the proper engineering tools?

    • Frameworks - How to choose the right deep learning frameworks?

    • Distributed Training - How to train the models in a distributed fashion?

    • Resource Management - How to provision and mange distributed GPUs?

    • Experiment Management - How to manage and store model experiments?

    • Hyper-parameter Tuning - How to tune model hyper-parameters?

  • The deployment component

    • Continuous Integration and Testing - How to not break things as models are updated?

    • Web - How to deploy models to web services?

    • Hardware and Mobile - How to deploy models to embedded and mobile systems?

    • Interchange - How to deploy models across systems?

    • Monitoring - How to monitor model predictions?

  • All-In-One: There are solutions that handle all of these components!

Overview - Infrastructure and Tooling