Overview

Why is data management important?
Overview - Data Management

Summary

  • Data science has never been as much about machine learning as it has about cleaning, shaping, and moving data from place to place.
  • Here are the important concepts in data management:
    • Sources - how to get training data
    • Labeling - how to label proprietary data at scale
    • Storage - how to store data and metadata appropriately
    • Versioning - how to update data through user activity or additional labeling
    • Processing - how to aggregate and convert raw data and metadata