I have been playing around with Python for around 8 years now and Python’s Global Interpreter Lock (or GIL) amazes me each time I think about it. We’ve all heard and experienced GIL hampering the ability to multithread. But why and more importantly , how does this happen? What is the impact of GIL on multithreading? These are some of questions that are always left unanswered.
I’d like to help demystify GIL for you. Here is what you will gather from this article:
- Understand Python threading and GIL management in Python v2.7
- The impact of GIL on multicore systems and Priority Inversion
- GIL in Python v3.2 and Convoy Effect
Setting the context
Let’s start with a very naive looking, number crunching operation. In this example, we calculate the factorial of a huge number:
__author__ = 'Chetan' from datetime import datetime import threading def factorial(number): fact = 1 for n in range(1, number+1): fact *= n return fact number = 100000 thread = threading.Thread(target=factorial, args=(number,)) startTime = datetime.now() thread.start() thread.join() endTime = datetime.now() print "Time for execution: ", endTime - startTime
When I run this program on my machine, it takes around 3.4 seconds to run BUT if I spawn another thread, that does the exact same operation and then start both the threads, it takes around 6.2 seconds to run this code.
- 1 operation, 1 thread => 3.4 secs
- 2 operations, 2 threads => 6.2 secs
Hold on a sec! We’re running two threads so shouldn’t they run concurrently and essentially take ~3 secs only? Yes, you’re right. You don’t get the concurrency needed with Python multithreading because of the Global interpreter lock. Let’s understand this lock and go one level deeper.
What is wrong with Python threads
Python threads are plain old POSIX threads. This means that the threads created by Python programs are normal OS threads. As we all understand, the Python interpreter is a virtual machine (much like Java’s Virtual Machine). Unlike Java, it has no intelligence of thread management thus no thread priorities and no pre-emption. Since the threads created by Python are normal OS threads, the OS is responsible for supervising these threads and it takes care of the scheduling aspects. What does the interpreter do? Well, it does the book keeping on which thread is running when the context switch happens etc.
All this is ok, but where is GIL?
This is where the problem starts. Even though the Python interpreter starts the threads and the OS manages them, each running thread spawned by the interpreter requires exclusive access to data structures in Python source. Exclusive access is needed because Python’s memory management is not thread safe. Exclusivity points to the need for a synchronisation mechanism. Hence the rise of the Global Interpreter lock or GIL. Since only one thread can have exclusive access to Python data structures, the synchronisation is done by GIL and it allows for only one thread running at a time.
You can relate this to our code example in section 1 where even though two threads were scheduled by the OS, only one could run at a time and the other was run in a runnable state. When a runnable thread acquires the lock it starts running and performs operations. That’s why, both the threads literally ran sequentially and it took them 6 secs to finish the complete operation.
GIL in Python source
- A thread request in Python is a simple pthread_create() call.
- Py_Initialize() creates the GIL and does the bookkeeping of threads.
- As you’d have imagined by now, GIL is a mutex or a lock and implemented for synchronisation between threads.
- static PyThread_type_lockinterpreter_lock = 0; #This is GIL
What happens when Python threads are created? Let’s zoom in further.
- The Python interpreter implements a check. A check is a counter of ticks and a tick represents a Python VM byte code instruction.
- A check dictates the CPU time slice available to a thread to execute byte code.
- As soon as the check duration is elapsed, the current running thread releases the GIL.
- It then signals all the ‘runnable’ threads that the GIL is free.
- All the threads, including the one that released the GIL just now, battle for acquiring the GIL.
Impact of GIL
Because of GIL, there is considerable lag time that is attributed to communication (thread signalling), GIL battle, thread wake up time and GIL acquisition. This results in
- significant signalling overhead and communication time
- thread wait time is high if some other thread acquires the GIL
- threads run sequentially instead of concurrently
GIL on multi core systems
The above issues are more prominent in multi core systems. As we know, Python threads are normal OS threads. On multi core systems, the OS schedules both the CPU threads. The goals of the OS scheduler and Python interpreter conflict with each other. While the OS wants both the threads to be running, Python only allows one thread to run. This increases the GIL battle.
Now let’s put an I/O bound thread into the mix. I/O bound threads are essentially created for operations such as network access, database operations, operations to external devices among others. Lets look at what happens in this case.
As we now know, runnable threads compete with each other to acquire a GIL. In case of I/O bound and CPU bound thread, it may so happen that CPU bound thread always acquires the GIL. As a result I/O thread may always starve and may never get a chance to acquire GIL and run itself. This presents a case for priority inversion. Contrary to the fact that I/O threads are always given more priority, here in this case, CPU threads get higher precedence and as a result priorities are inverted. Hence Python threads present an issue of priority inversion in case of multi threading.
GIL in Python 3.2 and the Convoy Effect
So, what’s changed in Python 3.2 when compared to 2.7? Is it better or worse? Let’s take a look:
- ‘Check’ has been discontinued in Python 3.2.
- Every thread gets a fixed time interval of 5 ms to run itself after which it has to release the GIL.
- Once the GIL is released, it signals other threads that the GIL has been released.
- A sleeping thread wakes up, acquires the GIL and signals the previous owner.
This results in:
- Better GIL arbitration
- GIL battle is eliminated
- Less context switching and signalling
- Threads are more responsive
On my machine, I see 3.1 secs for one thread and 6.2 secs for two CPU threads for the example code we started with. So this time around, a 10-15% improvement in comparison to Python 2.7. As you can see, GIL was improved in the Python 3 release and we have better results.
Python 3.2 has another interesting problem though. It assumes that all the I/O threads are blocking and as a result, I/O threads always release the GIL as soon they acquire it. Python I/O extensively exercises this optimisation with file handling, socket ops and other I/O operations. If we apply this philosophy in an example where we deal with CPU + I/O bound threads, as soon as the I/O bound thread releases a GIL, the CPU thread acquires it. This process is on going; CPU threads always get the GIL and as a result performance suffers. This is the Convoy Effect and is visible in Python 3.2.
We learnt about Python GIL, about how it works and it’s impact on processing capabilities. We learnt the about how Priority Inversion and Convoy Effect negatively impact performance. Even though Python 3.2 does a better job at handling GIL, there’s still a lot to be done.
Solutions like the use of Python’s multiprocessing module, or using Jython (python written in Java) have been suggested to yield better results for this problem. But, I think it truly depends on your application. That is what decides which solution will work for you.
What’s a typical load your website encounters in today’s world and what role does GIL play here? What about Asynchronous I/O? What modules can help?
Need answers to these questions? Subscribe to stay informed!