Sources

Where do the training data come from?
Sources - Data Management

Summary

  • Most deep learning applications require lots of labeled data. There are publicly available datasets that can serve as a starting point, but there is no competitive advantage of doing so.
  • Most companies usually spend a lot of money and time to label their own data.
  • Data flywheel means harnessing the power of users rapidly improve the whole machine learning system.
  • Semi-supervised learning is a relatively recent learning technique where the training data is autonomously (or automatically) labeled.
  • Data augmentation is a strategy that enables practitioners to significantly increase the diversity of data available for training models, without actually collecting new data.
  • Synthetic data is data that’s generated programmatically, an underrated idea that is almost always worth starting with.
Last modified 3yr ago