# Web Deployment

{% embed url="<https://youtu.be/1BegV7gjqDo>" %}
Web Deployment - Testing and Deployment
{% endembed %}

## Summary

* For web deployment, you need to be familiar with the concept of **REST API.**
  * You can deploy the code to Virtual Machines, and then scale by adding instances.
  * You can deploy the code as containers, and then scale via orchestration.
  * You can deploy the code as a “server-less function.”
  * You can deploy the code via a model serving solution.
* If you are making **CPU inference**, you can get away with scaling by launching more servers (Docker), or going serverless (AWS Lambda).
* If you are using **GPU inference**, things like TF Serving and [Ray Serve](https://docs.ray.io/en/latest/serve/index.html#rayserve) become useful with features such as adaptive batching.


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://fall2019.fullstackdeeplearning.com/course-content/testing-and-deployment/web-deployment.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
