BorisBurkov.net

Asyncio ecosystem

March 29, 2019 4 min read

I have a very bad developer experience with Asyncio. It is such a messy and overcomplicated system that I studied it over at least 3 times now. I figured, it's time to cut my losses and write a post about it!

History of generators and corouines in python in 5-minutes

I don't want to dig too deep into the evolution of generators in python. Basically, the history of generators in python is as follows:

  1. PEP-0255 (2001, python 2.2): yield statement introduced
  2. PEP-0342 (2005, python 2.5): yield expression: two-way communication - you can pass data to coroutine, not only get data from it
  3. PEP-0380 (2009, python 3.3): yield from in python 3.4 as a way to delegate to a coroutine
  4. PEP-0492 (2015, python 3.5): async/await in python 3.5

1. Initially, yield statement was created as an alternative to return just to create generator functions that would lazily cook values on the flight and return them, e.g. xrange() function.

2. Later on they decided, they want to have communication between generator function and its caller to be 2-way, so that generator can not only produce values, but also receive them. So, yield statement was promoted to become an expression. This created an opportunity for communicating coroutines, which allowed famous David Beazley to create a whole operating system in python.

The main point of generators was to replace OS-controlled threads with interpreter-controlled analogues of threads, traditionally called fibers (there was also another attempt to make interpreter-controlled threads, gevent, where they were called greenlets or green threads, but this has nothing to do with asyncio). Fibers approach is also called cooperative multitasking in contrast with preemptive multitasking, implemented with threads.

When you create POSIX thread in python, it's up to the operating system's kernel to decide, when to switch between them. In case of python, which implements Global Interpreter Lock, the OS kernel does exactly the opposite to what you'd want to achieve, as was shown by David Beazley in his seminal talk.

3. Then yield from construct was introduced for delegating tasks to sub-coroutines easily. Basically this was a failed attempt to promote the use of generators.

4. Asyncio module provided a stock implementation of asynchronous event loop (again, as if Twisted and Tornado didn't exist) and borrowed async/await statements from C# or Javascript.

async/await syntax is just an alternative to the yield syntax, so that we can run new native coroutines just by calling foo().send() on them without asycnio main loop as described here for instance.

Asyncio main loop provides a main fiber of execution, responsible for monitoring the sockets, while coroutines were supposed to delegate all the blocking calls to main loop to execute. I'll mostly speak of Asyncio in this post.

Asyncio

Asyncio is now a part of core python 3 codebase.

Performance-critical parts of it are written in plain C, while the rest is in python.

Central idea to asyncio is: let's create an async loop that is run in the main coroutine. When we run an await call somewhere in an async coroutine, it returns the control to the main loop and main loop will poll all of its sockets/file descriptors/timeouts etc. at once.

Typically, we await in some blocking call, e.g. while doing a query to a database or an http request. This is the same approach, as Nginx or Node.js were rocking since the early 2010s (even earlier, probably) to parallelize execution of multiple event handlers.

Awaitables, coroutines (generator-based and native), coroutine functions, futures/tasks etc.

Asyncio introduced many new concepts, see the Glossary.

  • Awaitables: objects that can accept with __await__() methods. Usually these are coroutines, futures or tasks.
  • Coroutines:
    • Coroutine functions - functions, declared with async def foo(): or def foo(): yield .... First ones are called native coroutines, second - generator-based.
    • Coroutine objects - object, created by calling a coroutine function e.g. foo(). You'll await on those: await foo() to start running a coroutine.
  • Futures: same thing as promise in Javascript and other language - something that will contain a result of async operation as soon as it's done and can be observed.
  • Tasks: tasks are coroutines, wrapped by the main loop so that they can be cancelled, their exceptions can be processed etc.

The most important part of Asyncio main loop is run_once() function, where all the magic of socket polling is happening.


Boris Burkov

Written by Boris Burkov who lives in Moscow, Russia and Cambridge, UK, loves to take part in building future technologies, think about the world, we're living in at present and admires the giants of the past. You can follow me on Telegram