Global Interpreter Lock, or how to kill it

People that listened to my (Armin Rigo) lightning talk at EuroPython know that
suddenly, we have a plan to remove the Global Interpreter Lock — the
infamous GIL, the thing in CPython that prevents multiple threads from
actually running in your Python code in parallel.

That’s not actually new, because Jython has been doing it all along.
Jython works by very carefully adding locks to
all the mutable built-in types, and by relying on the underlying Java
platform to be efficient about them (so that the result is faster than,
say, very carefully adding similar locks in CPython). By “very
carefully”, I mean really really carefully; for example,
‘dict1.update(dict2)’ needs to lock both dict1 and dict2, but if you do
it naively, then a parallel ‘dict2.update(dict1)’ might cause a

All of PyPy, CPython and IronPython have a GIL. But for PyPy we are considering
a quite different approach than Jython’s, based on Software
Transactional Memory
. This is a recent development in computer
science, and it gives a nicer solution than locking. Here is a short
introduction to it.

Say you want to atomically pop an item from ‘list1’ and append it to

def f(list1, list2):
    x = list1.pop()

This is not safe in multithreaded cases (even with the GIL). Say that
you call f(l1, l2) in thread 1 and f(l2, l1) in thread 2. What
you want is that it has no effect at all (x is moved from one list to
the other, then back). But what can occur is that instead the top of
the two lists are swapped, depending on timing issues.

One way to fix it is with a global lock:

def f(list1, list2):
    x = list1.pop()

A finer way to fix it is with locks that come with the lists:

def f(list1, list2):
    acquire_all_locks(list1.lock, list2.lock)
    x = list1.pop()
    release_all_locks(list1.lock, list2.lock)

The second solution is a model for Jython’s, while the first is a model
for CPython’s. Indeed, in CPython’s interpreter, we acquire the GIL,
then we do one bytecode (or actually a number of them, like 100), then
we release the GIL; and then we proceed to the next bunch of 100.

Software Transactional Memory (STM) gives a third solution:

def f(list1, list2):
    while True:
        t = transaction()
        x = list1.pop(t)
        list2.append(t, x)
        if t.commit():

In this solution, we make a transaction object and use it in all
reads and writes we do to the lists. There are actually several
different models, but let’s focus on one of them. During a transaction,
we don’t actually change the global memory at all. Instead, we use the
thread-local transaction object. We store in it which objects we
read from, which objects we write to, and what values we write. It is
only when the transaction reaches its end that we attempt to “commit”
it. Committing might fail if other commits have occurred in between,
creating inconsistencies; in that case, the transaction aborts and
must restart from the beginning.

In the same way as the previous two solutions are models for CPython and
Jython, the STM solution looks like it could be a model for PyPy in the
future. In such a PyPy, the interpreter would start a transaction, do
one or several bytecodes, and then end the transaction; and repeat.
This is very similar to what is going on in CPython with the GIL. In
particular, it means that it gives programmers all the same guarantees
as the GIL does. The only difference is that it can actually run
multiple threads in parallel, as long as their code does not interfere
with each other. (In particular, if you need not just the GIL but actual
locks in your existing multi-threaded program, then this will not
magically remove the need for them. You might get an additional built-in
module that exposes STM to your Python programs, if you prefer it over
locks, but that’s another question.)

Why not apply that idea to CPython? Because we would need to change
everything everywhere. In the example above, you may have noted that I
no longer call ‘list1.pop()’, but ‘list1.pop(t)’; this is a way to tell
that the implementation of all the methods needs to be changed, in order
to do their work “transactionally”. This means that instead of really
changing the global memory in which the list is stored, it must instead
record the change in the transation object. If our interpreter is
written in C, as CPython is, then we need to write it explicitly
everywhere. If it is written instead in a higher-level language, as
PyPy is, then we can add this behavior as as set of translation rules, and
apply them automatically wherever it is necessary. Moreover, it can be
a translation-time option: you can either get the current “pypy” with a
GIL, or a version with STM, which would be slower due to the extra
bookkeeping. (How much slower? I have no clue, but as a wild guess,
maybe between 2 and 5 times slower. That is fine if you have enough
cores, as long as it scales nicely 🙂

A final note: as STM research is very recent (it started around 2003),
there are a number of variants around, and it’s not clear yet which one
is better in which cases. As far as I can tell, the approach described
in “A Comprehensive Strategy for Contention Management in Software
Transactional Memory” seems to be one possible state-of-the-art; it also
seems to be “good enough for all cases”.

So, when will it be done? I cannot say yet. It is still at the idea
stage, but I think that it can work. How long would it take us to
write it? Again no clue, but we are looking at many months rather
than many days. This is the sort of thing that I would
like to be able to work on full time after the Eurostars funding
runs out on September 1. We are currently looking at ways to use
crowdfunding to raise money so that I can do exactly that. Expect
a blog post about that very soon. But this looks like a perfect
candidate for crowdfunding — there are at least thousands of you who
would be willing to pay 10s of Euros to Kill the GIL. Now we only
have to make this happen.

Comments are closed.