Sources
Where do the training data come from?
Sources - Data Management
- Most deep learning applications require lots of labeled data. There are publicly available datasets that can serve as a starting point, but there is no competitive advantage of doing so.
- Most companies usually spend a lot of money and time to label their own data.
- Data flywheel means harnessing the power of users rapidly improve the whole machine learning system.
- Semi-supervised learning is a relatively recent learning technique where the training data is autonomously (or automatically) labeled.
- Data augmentation is a strategy that enables practitioners to significantly increase the diversity of data available for training models, without actually collecting new data.
- Synthetic data is data that’s generated programmatically, an underrated idea that is almost always worth starting with.
Last modified 3yr ago