Good man (or woman), do you have a lot to process? Are you using an interpreted language, like Python or Ruby? Did you decide to use parallelism to speed things up? Did you implement it using threads? Did you make it worse, slower? Are you having a having a hard time trying to figure out what you did wrong?
Chances are you did (mostly) everything right That’s right! Maybe you did everything right with your threads and you code is slower than your single threaded version. And if that’s the case, you might have stumbled across the GIL, which stands for Global Interpreter Lock. Fancy name, uh? Well, basically, the GIL is a mutual exclusion lock that ensures that only one thread is being run in the interpreter at the time. In practice this means that full parallel execution is not allowed. Begin Python interpreted language, a GIL it’s part of it’s implementation. Lee see this with an example:
LIMIT = 50000000 def cycle(n): while n < LIMIT: n += 1 cycle(0)
Simple example, but enough to exemplify. If we run this and time it:
time python single_thread.py
Let’s try and make it faster. Let’s divide the work between two threads, each of them doing half the work. We would expect a performance boost.
from threading import Thread LIMIT = 50000000 def cycle(n): while n < LIMIT: n += 1 t1 = Thread(target=cycle,args=(LIMIT/2,)) t2 = Thread(target=cycle,args=(LIMIT/2,)) t1.start() t2.start() t1.join() t2.join()
And again, if we run it:
time python threaded.py
As we can see, we didn’t improve it but actually made it worse. And that’s because of what was told previously: the GIL prevents multiple threads to be run by the interpreter simultaneously. Instead, threads are switching, and that switching is controlled by the GIL. Let’s simple visualize it like this:
What advantages does this kind on implementation gives? Well, some actually:
- easier implementation, easier memory management;
- increased speed for single-threaded executions;
- easier integration with libraries.
Of course it has some drawbacks, and can even have strange behaviour on multi-core environments (check Inside the Python Gil). Easily we can see that for code that needs to do CPU intensive processing we will have a problem. So, how can we get around this? I’ll leave you with a few tips::
- Use python multiprocessing package. It offers concurrency by using subprocesses instead of threads.
- Python is used loosely to refer to the default Python implementation, which is in C (CPyhton). There are other implementations (for example Jython, written in Java) that do not suffer from the GIL‘s effecs;
- Do not use Python at all- you should use the right tool for the job, and Python might not be the one.
P.S. I want to thank Vitor Torres, Ricardo Sousa and Nuno Silva for the reviews.