Overview

Why is data management important?

Summary

  • Data science has never been as much about machine learning as it has about cleaning, shaping, and moving data from place to place.

  • Here are the important concepts in data management:

    • Sources - how to get training data

    • Labeling - how to label proprietary data at scale

    • Storage - how to store data and metadata appropriately

    • Versioning - how to update data through user activity or additional labeling

    • Processing - how to aggregate and convert raw data and metadata

Last updated