# Web Deployment

{% embed url="<https://youtu.be/1BegV7gjqDo>" %}
Web Deployment - Testing and Deployment
{% endembed %}

## Summary

* For web deployment, you need to be familiar with the concept of **REST API.**
  * You can deploy the code to Virtual Machines, and then scale by adding instances.
  * You can deploy the code as containers, and then scale via orchestration.
  * You can deploy the code as a “server-less function.”
  * You can deploy the code via a model serving solution.
* If you are making **CPU inference**, you can get away with scaling by launching more servers (Docker), or going serverless (AWS Lambda).
* If you are using **GPU inference**, things like TF Serving and [Ray Serve](https://docs.ray.io/en/latest/serve/index.html#rayserve) become useful with features such as adaptive batching.
