Are GIL and Thread-Safety synonyms?

After writing  GIL – Global Interpreter Lock, a common question people ask me is:”:

So the GIL ensures that my code is thread-safe, doesn’t it?

Before I go into the nuts and bolts of it, let’s start the other way around and check it with a practical example. Let’s create a simple class that adds 1’s to a var:

class Adding:
    def __init__(self):
	self.added = False
	self.value = 0

    def has_added(self):
	return self.added

    def add(self):
	print "Adding...\n"
	self.value +=1
	self.added = True

    def get_value(self):
	return self.value

Now, let’s create a few threads to run an object of that class and see what happens:

from threading import Thread

def do_add(adding):	
    if not adding.has_added():				
	adding.add()

adding = Adding()
for i in xrange(5):
    Thread(target=do_add,args=(adding,)).start()

Let’s run this code a few times:

$ python thread_safety.py
Adding…

$ python thread_safety.py
Adding…

Adding…

Ooops! We got it twice. Let’s double check , and verify that the value is more than 1:

$ python thread_safety.py
Adding…

Value: 1
$ python thread_safety.py
Adding…

Adding…

Value: 2

So, the answer to the question is No. What’s happening then? If  the GIL prevents two threads from running simultaneously, how can this be?

The problem with our code is that we introduced a race-condition. A race-condition happens when two or more threads are trying to access some shared data and then try to change it at the same time. The GIL ensures atomicity by interleaving the threads, meaning they can be stopped in the middle of their execution, and if a race-condition is present then we’re in trouble. There are some ways to solve this, and I’ll leave you with two simple solutions, to fix this:

P.S. I want to thank Vitor Torres,  Ricardo Sousa and Nuno Silva for the reviews.

Advertisements

GIL – Global Interpreter Lock

Good man (or woman), do you have a lot to process? Are you using an interpreted language, like Python or Ruby? Did you decide to use parallelism to speed things up? Did you implement it using threads? Did you make it worse, slower? Are you having a having a hard time trying to figure out what you did wrong?

Chances are you did (mostly) everything right That’s right! Maybe you did everything right with your threads and you code is slower than your single threaded version. And if that’s the case, you might have stumbled across the GIL, which stands for  Global Interpreter Lock. Fancy name, uh? Well, basically, the GIL is a mutual exclusion lock that ensures that only one thread is being run in the interpreter at the time. In practice this means that full parallel execution is not allowed. Begin Python interpreted language, a GIL it’s part of it’s implementation. Lee see this with an example:

LIMIT = 50000000

def cycle(n):	
    while n < LIMIT:
	n += 1

cycle(0)

Simple example, but enough to exemplify. If we run this and time it:

time python single_thread.py

real 0m2.608s
user 0m2.440s
sys 0m0.004s

Let’s try and make it faster. Let’s divide the work between two threads, each of them doing half the work. We would expect a performance boost.

from threading import Thread

LIMIT = 50000000

def cycle(n):	
    while n < LIMIT:
	n += 1

t1 = Thread(target=cycle,args=(LIMIT/2,))
t2 = Thread(target=cycle,args=(LIMIT/2,))
t1.start()
t2.start()
t1.join()
t2.join()

And again, if we run it:

time python threaded.py

real 0m5.256s
user 0m6.416s
sys 0m1.092s

As we can see, we didn’t improve it but actually made it worse. And that’s because of what was told previously: the GIL prevents multiple threads to be run by the interpreter simultaneously. Instead, threads are switching, and that switching is controlled by the GIL. Let’s simple visualize it like this:

thread_switching

What advantages does this kind on implementation gives? Well, some actually:

  • easier implementation, easier memory management;
  • increased speed for single-threaded executions;
  • easier integration with libraries.

Of course it has some drawbacks, and can even have strange behaviour on multi-core environments (check Inside the Python Gil). Easily we can see that for code that needs to do CPU intensive processing we will have a problem. So, how can we get around this? I’ll leave you with a few tips::

  • Use python multiprocessing package. It offers concurrency by using subprocesses instead of threads.
  • Python is used loosely to refer to the default Python implementation, which is in C (CPyhton). There are other implementations (for example Jython, written in Java) that do not suffer from the GIL‘s effecs;
  • Do not use Python at all- you should use the right tool for the job, and Python might not be the one.

P.S. I want to thank Vitor Torres,  Ricardo Sousa and Nuno Silva for the reviews.

Tail recursion: what is that?

Let’s start this one with a formal definition which we’ll borrow from Wikipedia:

In computer science, a tail call is a subroutine call that happens inside another procedure as its final action; it may produce a return value which is then immediately returned by the calling procedure. The call site is then said to be in tail position, i.e. at the end of the calling procedure. If any call that a subroutine performs, such that it might eventually lead to this same subroutine being called again down the call chain, is in tail position, such a subroutine is said to be tail-recursive, which is a special case of recursion. Tail calls need not be recursive – the call can be to another function – but tail recursion is particularly useful, and often easier to handle in implementations.

Putting it in simpler terms: when you call a procedure as the final action of your own procedure (function, method, etc)  you have a tail call. If that call happens to be called again down the call chain, it is tail-recursive.

This is all very interesting you might say, but why is this important? Well, this is important because these type of calls can be implemented without adding new stack frames to the call stack. That means that we don’t need to fill up the call stack and eat up a lot a memory in the process.  This is particularly relevant for functional languages (for example Erlang, Scala) where for and while loops don’t exist and so recursion is mandatory.

As I am more of a hands on guy let’s illustrate this with some examples to understand the concept. Let’s use a recursive function by nature: factorial. Straight from it’s definition , we can code as an Erlang function to calculate it (we’ll borrow it from Learn You Some Erlang for great good!) :

fac(0) -> 1;
fac(N) when N > 0 -> N*fac(N-1).

Let’s do a small trace for this function with 4 as an input:

fac(4) 	= 4 * fac(4 - 1)
	= 4 * 3 * fac(3 - 1)
	= 4 * 3 * 2 * fac(2 - 1)
	= 4 * 3 * 2 * 1 * fac(1 - 1)
	= 4 * 3 * 2 * 1 * 1
	= 4 * 3 * 2 * 1
	= 4 * 3 * 2
	= 4 * 6
	= 24 

While this works fine for a small number like 4, as you can see there are quite a lot of fac calls. Imagine that for big numbers: we would be in trouble. Because the last call for this function involves a multiplication between a number a a recursive call to fac, a new stack frame has to be placed in the call stack so that at the end we can go back and multiply the results.

Let’s try and change our function to be tail recursive. What that means is that we will implement it in a way that the same stack frame is used. And that can be achieve if the last last call of our function is only, you guessed it, a funciton. Let’s see how it goes:

tail_fac(N) -> tail_fac(N,1).

tail_fac(0,Acc) -> Acc;
tail_fac(N,Acc) when N > 0 -> tail_fac(N-1,N*Acc).

Again, let’s do a small trace and check what’s happening:

tail_fac(4)    = tail_fac(4,1)
tail_fac(4,1)  = tail_fac(4-1, 4*1)
tail_fac(3,4)  = tail_fac(3-1, 3*4)
tail_fac(2,12) = tail_fac(2-1, 2*12)
tail_fac(1,24) = tail_fac(1-1, 1*24)
tail_fac(0,24) = 24

Well, well, well! It seems that we don’t need to fill up the call stack with new frames after all. And this is all pretty with functional languages, but what about imperative one’s? Let’s make use a Python example.

def fac(n):
    if n == 0:
        return 1
    else:
	return n * fac(n-1)

Again, straight from the definition, and fairly simple. If we try to run with numbers bigger than 998 we get:

RuntimeError: maximum recursion depth exceeded

This is because Python, as well as other languages, has a limit (Python’s default is 1000) to the number of recursive calls you can make. Faced with this, you can try and solve 3 ways:

  • increase the limit (you still would be limited to the maximum amount you could increase it to, and also would be increasing the calls stack size a lot and memory consumption
  • write an imperative alternative
  • write a tail recursive alternative

Because we’re illustrating tail recursion, let’s go with the third approach:

def fac(n):
    return tail_fac(n, 1)

def tail_fac(n, acc):
    if n == 0:
	return acc
    else:
	return tail_fac(n - 1, acc * n)

If we try to run with numbers bigger than 998 the result will be:

RuntimeError: maximum recursion depth exceeded

Wait, what?!? How’s this possible? I’m using a tail recursive solution. This shouldn’t have happening. I’ve I been talking nonsense up until now? No I didn’t, and I didn’t choose Python by chance.

The fact is, although you might implement your function in a tail recursive fashion, it all comes down to the implementation of the language. The fact that the same stack frame is used for consecutive recursive calls depends on the language implementation. Python does not implement it. You can check the links bellow on, from Guido himself, explaining why:

 

P.S. I want to thank Ricardo Sousa and Nuno Silva for the reviews.