tychoish, a wiki

tychoish/software-development/ Python Concurrency Rant

Python Concurrency Rant

What and Why

Concurrency is the term we use to think about operations that can happen at the same time. A computer program is just a list of operations that have some sort of ordering. If you run those operation in the order the programmer wrote them in, and the programmer wrote the right code, everything should work fine: If operation B depends on the outcome of operation A, then you just have to make sure that operation A happens before B.

There are two kinds of problems that this kind of model doesn't address well:

  1. Programs that respond to user or environmental interactions.

    Many programs don't actually have a linear procedure, and depend on user input (e.g. a word processor or text editor must wait for you to type text).

    Let's call these "event driven" programs.

  2. Programs may consist of operations that don't depend on each other.

    Consider a program that has operations A, B, C, and D. Sometimes, you may have to run D after C and C after B, and B after A; but sometimes B, C, and D are totally (or mostly) independent of each other and can run in any order.

    Let's call these "potentially parallel workloads."

While both of these kinds of models help us to think about ways that software can reflect and respond to operations happening at the same time, concurrency is more subtle than "parallelism." Concurrency is about modeling the dependencies and relationships between operations, parallelism is really just an implementation detail.

Parallelism requires concurrency; but you can execute concurrent designs in parallel or not as needed or desired.

There are caveats both during development and at runtime:

Reference: http://blog.golang.org/concurrency-is-not-parallelism

Implementation Details (Python)

A single Python process does not support parallel execution: only one operation can execute at a time. Sort of. The rule is more complex:

The second two operations can run in parallel with each other and with execution of Python code (i.e. these operations are "yielding"). These yielding operations typically account for the operations that take the most amount of time. The downside is that yielding operations account for a small percentage of the number of operations in a Python program.

Python provides two (native) parallelism metaphors: threads and processes.

The best part is that the interfaces for working with threads and processes are the same, which makes testing easier.

The Rant

The problem isn't that Python doesn't have concurrency tools, it's that no one started writing Python with the idea that parallelism and concurrency would be a defining element of most systems that people would need or want to write.

The result is that while it's theoretically possible to modify Python itself to be more concurrent, one of the two things happen:

The work on this is ongoing, of course, and eventually I suspect there will be some solution, but the change is unlikely to be revolutionary. In the mean time, it's awkward and sometimes awful:

What would make this better?