Async and Distributed Python Server with Uvicorn, Starlette and Ray

Python dominates data science at the moment and many companies are building server applications with it because of that. Async Python with Uvicorn, Starlette and Ray are a great opportunity for creating scalable and distributed server apps.
Intro
As a system architect I often worry whether what we do is “scalable”. But what does that mean? Scalability has many aspects and it is highly dependent on the app context. For instance, the back-end scalability for a mobile app has very different requirements compared to serving a static website. The kind of applications that concern me
- are normally web-based;
- have a few users;
- and often crunch data.
The back ends for data processing are often written in Python as opposed to the faster Java.
Three years ago I went to PyCon in London. I was shocked to discover that many organisations, including bank, run server application written in Python. Fair enough, the banks were not using them for nanosecond high-frequency trading but I always assumed these guys worked only with Java or something even faster like C++.
Nowadays, we also offer customers back ends written in Python. This language is invaluable when it comes to crunching data and data scientists love it. Anyone who works with data scientists understands the pain of transforming their Python scripts into something that can be run in production. Imagine having to translate every script into a different language and the mess is complete. Every time I do it I swear that next time I will just run the script as it is!
Python is not the fastest option out there but when it comes to crunching data, users are used to waiting a little bit longer than let’s say for loading a webpage. So at least for now, this language will keep dominating data processing, wrangling, AI , etc.
Instagram for instance has invested heavily over the years in Python servers. For example they published their own compilation for Python 3.8 trying to improve the performance of their servers:
Cinder is Instagram’s internal performance-oriented production version of CPython 3.8. (source)
Instagram Server is a several-million-line Python monolith, and it moves quickly. (source)
Ok but what about serving a moderate amount of requests from a Python server? Isn’t Python single-threaded? Yes, it is and this is a problem. Fortunately this problem has a few possible solutions.
Uvicorn + Starlette + async for processing
Asynchronous servers are easy to reason about and they make it simple to squeeze more out of one CPU. While the server is waiting for an HTTP request to return, it can work on other tasks. It is as simple as that. Here is a simple async Uvicorn server example:
from starlette.applications import Starlette
from starlette.requests import Request
from starlette.responses import JSONResponse, PlainTextResponse
from starlette.routing import Routeasync def homepage(req: Request):
return PlainTextResponse("ok: service is running")async def echo(req: Request):
try:
res = await req.json() except Exception as err:
res = {"message": "failed JSON message"} return JSONResponse(res)app = Starlette(
debug=True,
routes=[
Route("/", homepage, methods=["GET"]),
Route("/echo", echo, methods=["POST"]),
],
)
To start this server all you need to do is to run this command:
uvicorn main:app --workers 2
assuming the script above is in a file called main.py
.
Congrats! You just launched an HTTP server with three Python processes, main process and two workers/children. Using workers helps a lot because they are independent python processes and if one worker is busy another one can process the incoming requests. Uvicorn does not seem to have a limit on the number of workers one can launch but each worker is a separate process which takes up resources regardless of the current server load. Each process executes requests asynchronously as well.
A problem arises with these workers when the server needs to do some heavy lifting, e.g. process large batch of data, run an ETL, etc. The workers are meant for processing requests so anything long-running like ETLs is a bad idea. They by design have a timeout for responding to a request of the order of minutes, which makes sense for requests but not for longer computations. Enter ray.io!
Ray.io for distributed computing
Ray is a relatively new addition to the Python ecosystem. It was publicly announced first in 2018 I think. Ray fills in an important gap, namely the ability to orchestrate multiple python processes in a cluster. It was meant specifically for crunching through lots of data for ML training. Ray allows you to start an actor process with its own resources and wait patiently for the results to become available.
A reasonable scenario is sending a request to the server to run an ETL process. The request is answered immediately with “ok, the ETL is running” while the actual data crunching is off-loaded to Ray and the user can come back later to check the result. Ray itself scales seamlessly up and down depending on demand.
This is a simple example of how to create an Uvicorn server which uses Ray actors.
import ray
from starlette.applications import Starlette
from starlette.responses import JSONResponse, PlainTextResponse
from starlette.routing import Route
from starlette.requests import Request@ray.remote
class EchoActor(object):
def __init__(self):
passdef response(self, jsonMessage: dict):
return JSONResponse(jsonMessage)ray.init()
ray.nodes()async def echo(req: Request): echoBot = EchoActor.remote()
try:
res = await req.json() except Exception as err:
res = {"message": "failed JSON message"} remoteJob = await echoBot.response.remote(res) return remoteJobapp = Starlette(
debug=True,
routes=[
Route("/echo", echo, methods=["POST"]),
],
)
Conclusion
The actor model for computation and its use for distributed/parallel computing is not new. Scala + Akka actors have been around for more than 10 years. It is great to see similar concepts finally being available within the Python environment. One major difference is the overhead of Ray actors in comparison to Akka ones. In the case of Ray they are separate Python processes each one with its own memory and CPU requirements. Initialising a new Ray actor takes a bit of time or at least much longer than then equivalent step in Akka. Hence, spinning up and down Ray actors needs to be done with some care but that is still much easier than trying to create the same functionality from scratch (and less buggy). For comparison Akka actors are very light and have minimal footprint; so much so that easily you can have 10,000+ of them running in memory.
For completeness, Uvicorn workers are very similar to Ray actors but they are meant to process server request really and do not scale up or down based on demand.
In short Ray is not perfect but it adds a vital functionality to the Python ecosystem and combined with async server frameworks like Uvicron, it goes a very long way!
My name is Nik Vaklev and I am the founder of Techccino Ltd. If you are interested in building your own web apps get in touch.