Web Deployment

How to deploy your models to the web?

Summary

  • For web deployment, you need to be familiar with the concept of REST API.

    • You can deploy the code to Virtual Machines, and then scale by adding instances.

    • You can deploy the code as containers, and then scale via orchestration.

    • You can deploy the code as a “server-less function.”

    • You can deploy the code via a model serving solution.

  • If you are making CPU inference, you can get away with scaling by launching more servers (Docker), or going serverless (AWS Lambda).

  • If you are using GPU inference, things like TF Serving and Ray Serve become useful with features such as adaptive batching.

Last updated