How to implement and debug deep learning models?
Debug - Troubleshooting
- The 5 most common bugs in deep learning models include:
- Incorrect shapes for tensors.
- Pre-processing inputs incorrectly.
- Incorrect input to the loss function.
- Forgot to set up train mode for the network correctly.
- Numerical instability - inf/NaN.
- 3 pieces of general advice for implementing models:
- Start with a lightweight implementation.
- Use off-the-shelf components such as Keras if possible, since most of the stuff in Keras works well out-of-the-box.
- Build complicated data pipelines later.
- The first step is to get the model to run:
- For shape mismatch and casting issues, you should step through your model creation and inference step-by-step in a debugger, checking for correct shapes and data types of your tensors.
- For out-of-memory issues, you can scale back your memory-intensive operations one-by-one.
- For other issues, simply Google it. StackOverflow would be great most of the time.
- The second step is to have the model overfit a single batch:
- Error goes up: Commonly this is due to a flip sign somewhere in the loss function/gradient.
- Error explodes: This is usually a numerical issue, but can also be caused by a high learning rate.
- Error oscillates: You can lower the learning rate and inspect the data for shuffled labels or incorrect data augmentation.
- Error plateaus: You can increase the learning rate and get rid of regularization. Then you can inspect the loss function and the data pipeline for correctness.
- The third step is to compare the model to a known result:
- The most useful results come from an official model implementation evaluated on a similar dataset to yours.
- If you can’t find an official implementation on a similar dataset, you can compare your approach to results from an official model implementation evaluated on a benchmark dataset.
- If there is no official implementation of your approach, you can compare it to results from an unofficial model implementation.
- Then, you can compare to results from a paper with no code, results from the model on a benchmark dataset, and results from a similar model on a similar dataset.
- An under-rated source of results come from simple baselines, which can help make sure that your model is learning anything at all.