It is crucial to monitor serving systems, training pipelines, and input data. A typical monitoring system can raise alarms when things go wrong and provide the records for tuning things.
Cloud providers have decent monitoring solutions.
Anything that can be logged can be monitored: dependency changes, distribution shift in data, model instabilities, etc.
Data distribution monitoring is an underserved need!
It is important to monitor the business uses of the model, not just its statistics. Furthermore, it is important to be able to contribute failures back to the dataset.