Asyncio ecosystem
March 29, 2019 3 min read
I have a very bad developer experience with Asyncio. It is such a messy and overcomplicated system that I studied it over at least 3 times now. I figured, it's time to cut my losses and write a post about it!
History of generators and corouines in python in 5-minutes
I don't want to dig too deep into the evolution of generators in python. Basically, the history of generators in python is as follows:
- PEP-0255 (2001, python 2.2):
yield
statement introduced - PEP-0342 (2005, python 2.5):
yield
expression: two-way communication - you can pass data to coroutine, not only get data from it - PEP-0380 (2009, python 3.3):
yield from
in python 3.4 as a way to delegate to a coroutine - PEP-0492 (2015, python 3.5):
async/await
in python 3.5
1. Initially, yield
statement was created as an alternative to return
just to create
generator functions that would lazily cook values on the flight and return them, e.g. xrange()
function.
2. Later on they decided, they want to have communication between generator function and its caller to be 2-way,
so that generator can not only produce values, but also receive them. So, yield
statement was
promoted to become an expression. This created an opportunity for communicating coroutines, which allowed
famous David Beazley to create a whole operating
system in python.
The main point of generators was to replace OS-controlled threads with interpreter-controlled analogues of
threads, traditionally called fibers (there was also another attempt to make interpreter-controlled threads,
gevent
, where they were called greenlets or green threads, but this has nothing to do with
asyncio). Fibers approach is also called cooperative multitasking in contrast with preemptive multitasking,
implemented with threads.
When you create POSIX thread in python, it's up to the operating system's kernel to decide, when to switch between them. In case of python, which implements Global Interpreter Lock, the OS kernel does exactly the opposite to what you'd want to achieve, as was shown by David Beazley in his seminal talk.
3. Then yield from
construct was introduced for delegating tasks to sub-coroutines easily.
Basically this was a failed attempt to promote the use of generators.
4. Asyncio module provided a stock implementation of asynchronous event loop (again, as if Twisted and Tornado
didn't exist) and borrowed async/await
statements from C# or Javascript.
async/await
syntax is just an alternative to the yield
syntax, so that we can run
new native coroutines just by calling foo().send()
on them without asycnio main
loop as described here for
instance.
Asyncio main loop provides a main fiber of execution, responsible for monitoring the sockets, while coroutines were supposed to delegate all the blocking calls to main loop to execute. I'll mostly speak of Asyncio in this post.
Asyncio
Asyncio is now a part of core python 3 codebase.
Performance-critical parts of it are written in plain C, while the rest is in python.
Central idea to asyncio is: let's create an async loop that is run in the main coroutine. When we
run an await
call somewhere in an async coroutine, it returns the control to the main loop and
main loop will poll all of its sockets/file descriptors/timeouts etc. at once.
Typically, we await
in some blocking call, e.g. while doing a query to a database or an http
request. This is the same approach, as Nginx or Node.js were rocking since the early 2010s (even earlier,
probably) to parallelize execution of multiple event handlers.
Awaitables, coroutines (generator-based and native), coroutine functions, futures/tasks etc.
Asyncio introduced many new concepts, see the Glossary.
-
Awaitables: objects that can accept with
__await__()
methods. Usually these are coroutines, futures or tasks. - Coroutines:
-
Coroutine functions - functions, declared with
async def foo():
ordef foo(): yield ...
. First ones are called native coroutines, second - generator-based. -
Coroutine objects - object, created by calling a coroutine function e.g.
foo()
. You'll await on those:await foo()
to start running a coroutine. - Futures: same thing as promise in Javascript and other language - something that will contain a result of async operation as soon as it's done and can be observed.
- Tasks: tasks are coroutines, wrapped by the main loop so that they can be cancelled, their exceptions can be processed etc.
The most important part of Asyncio main loop is run_once() function, where all the magic of socket polling is happening.
Written by Boris Burkov who lives in Moscow, Russia, loves to take part in development of cutting-edge technologies, reflects on how the world works and admires the giants of the past. You can follow me in Telegram