More data is preferred when we have access to more features and our models have low-bias.
Better models is preferred when the space of our feature set has low dimensions.
Transfer learning lowers the need for access to data. In order to use this method effectively, we want to fine-tune the pre-trained models on better data.
Occam's Razor: Given two models that perform more or less equally, you should always prefer the less complex.
Deep learning might not be preferred, even if it squeezes an increase of 1% accuracy.
Reasons to use simple models include scalability, system complexity, maintenance, explainability, etc.
More complex features may require a more complex model.
A more complex model may not show improvements with a feature set that is too simple.
A well-behaved Machine Learning feature should be reusable, transformable, interpretable, and reliable.
In deep learning, architecture engineering is the new feature engineering.
Most fascinating results in recent years come from a combination of the two approaches (stacked autoencoders, unsupervised pre-training, etc.).
Self-supervised learning is a learning paradigm where we train a model using labels that are naturally part of the input data, rather than requiring separate external labels.
Most practical applications of machine learning run an ensemble. You can use completely different approaches at the ensemble layer.
Ensemble resembles the way to turn any model into a feature!
Biases can happen in the data labels, or even in the presentation to end-users.
Introducing biases leads to a lack of fairness in machine learning.
Two desired properties of models in the wild are:
Easily extensible: incrementally/iteratively learn from "human-in-the-loop" or from additional data.
Knows what it does not know: model uncertainty in prediction and enable fall-back to manual.
Evaluation metrics used during offline and online experiments must match!
A/B tests help measure differences in metrics across statistically identical populations that each experience a different algorithm.
Use long-term metrics whenever possible.
Short-term metrics can be informative and allow faster decisions.
You should apply the best software engineering practices during the design of machine learning systems (encapsulation, abstraction, cohesion, low coupling, etc.).
However, design patterns for machine learning software are not well-known or documented.
Whenever you develop any ML infrastructure, you need to target two different modes:
ML experimentation that emphasizes flexibility, reusability, and ease of use.
ML production that adds on a new layer of performance and scalability.
In order to combine them:
Research should be done using tools that are the same in production.
Abstraction layers should be implemented on top of the optimized research code so they can be accessed from friendly experimentation tools.
Examples of other ML approaches include XGBoost, tensor methods, factorization machines, non-parametric Bayesian methods, etc.
Sometimes, deep learning methods do not outperform these simpler approaches.