Debug

How to implement and debug deep learning models?

Summary

  • The 5 most common bugs in deep learning models include:

    • Incorrect shapes for tensors.

    • Pre-processing inputs incorrectly.

    • Incorrect input to the loss function.

    • Forgot to set up train mode for the network correctly.

    • Numerical instability - inf/NaN.

  • 3 pieces of general advice for implementing models:

    • Start with a lightweight implementation.

    • Use off-the-shelf components such as Keras if possible, since most of the stuff in Keras works well out-of-the-box.

    • Build complicated data pipelines later.

  • The first step is to get the model to run:

    • For shape mismatch and casting issues, you should step through your model creation and inference step-by-step in a debugger, checking for correct shapes and data types of your tensors.

    • For out-of-memory issues, you can scale back your memory-intensive operations one-by-one.

    • For other issues, simply Google it. StackOverflow would be great most of the time.

  • The second step is to have the model overfit a single batch:

    • Error goes up: Commonly this is due to a flip sign somewhere in the loss function/gradient.

    • Error explodes: This is usually a numerical issue, but can also be caused by a high learning rate.

    • Error oscillates: You can lower the learning rate and inspect the data for shuffled labels or incorrect data augmentation.

    • Error plateaus: You can increase the learning rate and get rid of regularization. Then you can inspect the loss function and the data pipeline for correctness.

  • The third step is to compare the model to a known result:

    • The most useful results come from an official model implementation evaluated on a similar dataset to yours.

    • If you can’t find an official implementation on a similar dataset, you can compare your approach to results from an official model implementation evaluated on a benchmark dataset.

    • If there is no official implementation of your approach, you can compare it to results from an unofficial model implementation.

    • Then, you can compare to results from a paper with no code, results from the model on a benchmark dataset, and results from a similar model on a similar dataset.

    • An under-rated source of results come from simple baselines, which can help make sure that your model is learning anything at all.

Last updated