Quicksort: how it works!

Quicksort is a popular sorting algorithm. That being the case makes sense that we see how to implement it right? Let’s go then.

This algorithm basis itself on divide-and-conquer. For our purposes what that means is that, at each step we will use a pivot and divide our list into two sublists: one with lower values and one with equal or bigger values. Then we sort each sublist. The stopping condition of our algorithm will be when we reach the empty list. When that happens we start merging the sublists into our final and sorted list. Confused? Let’s see an example.

def quicksort(lst):
    if lst == []:
        return []

    # Partitioning the list. 
    # For the sake of simplicity, we'll use the first element as the pivot.
    lower = quick_sort([x for x in lst[1:] if x < lst[0]]) 	
    bigger_or_equal = quick_sort([x for x in lst[1:] if x >= lst[0]])

    return lower + [lst[0]] + bigger_or_equal

This example helps clarify the algorithm. As we can see, we start by checking if the list is empty. We don’t want to sort an empty list, now do we? If that’s the case, we return. Good!

Next we divide our list into two parts: lower and bigger or equal using the first list element as the pivot. At the same time we’re recursively applying the quicksort function to each sublist (that will eventually end when the lists are empty). When everything is sorted we merge the sublists and return.

Still we me so far? Excellent! If you’ve read some of my previous posts, you’ve have noticed that I’m a fan of testing. I don’t want you to call me a liar and as such let’s write some tests and check that the algorithm works.

import unittest
from quicksort import quicksort

class TestQuicksort(unittest.TestCase):
	def setUp(self):
		self.list_to_sort = [3,2,5,48,34,23,12]

	def test_quicksort(self):				
		sorted_list = quicksort(self.list_to_sort)
		self.assertEqual(sorted_list, [2, 3, 5, 12, 23, 34, 48])

$ python quicksort_tests.py
.
———————————————————————-
Ran 1 test in 0.000s

OK

It works. Phew! Nice to have some tests to make sure everything works and also if we decided to change something we can run them and check that we haven’t broken anything.

This is all nice but I never said that this algorithm is exclusive for integers, did I? No! Although the idea is all there, it would be nice if our algorithm worked with other types. That being said we will do a small change to our implementation so that we can sort list with other elements types.

Our approach will be to pass a compare function to the algorithm and use it instead of our integers comparisons. Because in Python functions are also objects, we can pass them as arguments to other functions.

But this time we’ll do it the other way around: we will write our tests first and then our code. TDD style!

import unittest
from quicksort import quicksort_gen

class TestQuickSortGen(unittest.TestCase):
	def setUp(self):
		self.list_integers = [3,2,5,48,34,23,12]
		self.list_letters  = ['a', 'r', 'd', 'b']

	def comp_int(self, a, b):
		if a < b:
			return -1
		else:
			return 1

	def comp_letters(self, char1, char2):
		if ord(char1) < ord(char2):
			return -1
		else:
			return 1	

	def test_quicksort_int(self):
		sorted_list = quicksort_gen(self.list_integers, self.comp_int)
		self.assertEqual(sorted_list, [2, 3, 5, 12, 23, 34, 48])

	def test_quicksort_letters(self):
		sorted_list = quicksort_gen(self.list_letters, self.comp_letters)
		self.assertEqual(sorted_list, ['a', 'b', 'd', 'r'])

if __name__ == "__main__":
	unittest.main()

As a convention, the compare function will return -1 if the first element is lower and 1 otherwise.

$ python quicksort_tests.py
FF
======================================================================
FAIL: test_quicksort_int (__main__.TestQuickSortGen)
———————————————————————-
Traceback (most recent call last):
File “quicksort_tests.py”, line 31, in test_quicksort_int
self.assertEqual(sorted_list, [2, 3, 5, 12, 23, 34, 48])
AssertionError: None != [2, 3, 5, 12, 23, 34, 48]

======================================================================
FAIL: test_quicksort_letters (__main__.TestQuickSortGen)
———————————————————————-
Traceback (most recent call last):
File “quicksort_tests.py”, line 35, in test_quicksort_letters
self.assertEqual(sorted_list, [‘a’, ‘b’, ‘d’, ‘r’])
AssertionError: None != [‘a’, ‘b’, ‘d’, ‘r’]

———————————————————————-
Ran 2 tests in 0.000s

FAILED (failures=2)

With our tests in place (and failing!) we can go ahead, write some code, and make them pass.

def quicksort_gen(lst, comp):
	if lst == [] :
		return []

	# Partitioning the list. 
	# For the sake of simplicity, we'll use the first element as the pivot.
	lower           = quicksort_gen([x for x in lst[1:] if comp(x, lst[0]) == -1], comp)
	bigger_or_equal = quicksort_gen([x for x in lst[1:] if comp(x, lst[0]) == 1], comp)

	return lower + [lst[0]] + bigger_or_equal

The changes to the original code are very small. As previously said, we only replaced our integer comparison function by the function passed as an argument.

Running the tests we no have (drum roll…):

$ python quicksort_tests.py
..
———————————————————————-
Ran 2 tests in 0.000s

OK

Tests passed. Great! Easy, wasn’t it?

Test your code with unit tests

You have a piece of code and you want/need to test it. Let’s get on with it! We will use a previous post as the code to test (you can find it here: Design Patterns: Factory Method) and will make use of Python’s unit testing framework (unittest) to build our tests.

Let’s not get ahead of ourselves. First, a common testing practice: to test their code many developers write some function that makes use of it and check if it works properly. Let’s check a possible example for our code.

def test():
    name_factory = NameFactory()

    name1 = name_factory.get_name("Castro,Ricardo")
    name2 = name_factory.get_name("Pedro Teixeira")

    print name1.get_first(), name1.get_last()
    print name2.get_first(), name2.get_last()

While this might be fine for our small example, that’s not what we really need to properly test our code. Let’s see a simple implementation for some tests and analyse it.

import unittest
from namer import NameFactory

class TestNameFactory(unittest.TestCase):
	def setUp(self):
		self.name_factory = NameFactory()

	def test_firstfirst(self):
		name = self.name_factory.get_name("Pedro Teixeira")

		self.assertEqual(name.get_first(), "Pedro")
		self.assertEqual(name.get_last(), "Teixeira")

	def test_lastfirst(self):
		name = self.name_factory.get_name("Castro,Ricardo")

		self.assertEqual(name.get_first(), "Ricardo")
		self.assertEqual(name.get_last(), "Castro")

if __name__ == '__main__':
	unittest.main()

So, the main idea is to use unit tests to properly test our code. Like the name points out, unit tests aim at testing units of code. How can we do that?

Python provides us a unit testing framework that we can use. We can then create a class that extends unittest.TestCase and, as we previous wrote a function to test our code, we can create as many functions as we want to test. By giving a quick look at the code, two points to notice here:

  • the methods representing test cases have to start with test;
  • there’s a method called setUp. That method will be run before each test. An equivalent tearDown method will be run (if defined) after each test.

As we can see, we’re testing the output of the functions by using the assertEqual function but there’s a lot more you can do. Check the unittest for more information about the framework.

Personally, I think testing your code is extremely important. There are many advantages to testing your code properly, for example:

  • you can be sure that your code is working properly;
  • you have a way to show others  that your code is working properly;
  • you can make changes to your code, and have a set of tests to make sure you haven’t broke anything. I find this particularly useful when you have to change something and you run into the fear of breaking anything inadvertently. If you have a good test suite, by running your tests you can be sure you didn’t break anything in the process.

I can think of many other reasons, but I will leave you to find them out for yourself.

Although the presented code here was written in Python, your favourite programming language should have some unit testing framework. Just search for it, read the docs and start using it. The underlying idea of this post is to write tests for your code. Like Nike: Just Do It.

Design Patterns: Factory Method

Design Patterns is an important subject to any developer and as so you can expect a post about them every once in a while.

The Factory Method is an object-oriented creational design pattern which deals with the problem of creating objects without having to specify the exact class of object that that should be created. Let’s build a simple example to illustrate the concept.

We’ll create a class that will deal with names. That class will store the first and last names and will provide methods to get them.

class Name:
    def __init__(self):
        self.first = ""
        self.last  = ""

    def get_first(self):
        return self.first

    def get_last(self):
        return self.last

Now, when we create a class to deal with names, we will pass it a string and that string can come in many formatss. For the sake of simplicity of this example we will be limited to two formats:

  • “FirstName LastName”, which will be handled by the FirstFirst class;
  • “LastName,FirstName”, which will be handled by the LastFirst class.

The code below implements those classes and they will extend our Name class.

class FirstFirst(Name):
    def __init__(self, s):
        i = s.rfind(" ")

        if i > -1:
            self.first = s[0:i]
            self.last  = s[i+1:len(s)]
        else:
            self.first = ""
            self.last  = s

class LastFirst(Name):
    def __init__(self, s):
        i = s.rfind(",")

        if i > -1:
            self.first = s[i+1:len(s)]            
            self.last  = s[0:i]
        else:            
            self.first = ""
            self.last  = s

The code itself is pretty simple and by now we’re only missing our factory. So let’s see how we will go about that.

class NameFactory:
    def get_name(self, s):
        i = s.find(",")

        if i > -1:
            return LastFirst(s)
        else:
            return FirstFirst(s)

So, what’s happening here? We’re simply identifying, based on the provided string, which of the actual classes should be used to create the object. The code using this factory doesn’t really care which class it’s being used. Let’s make it more clear with an example.

def test():
    name_factory = NameFactory()

    name1 = name_factory.get_name("Castro,Ricardo")
    name2 = name_factory.get_name("Pedro Teixeira")

    print name1.get_first(), name1.get_last()
    print name2.get_first(), name2.get_last()

We’re creating a test function that creates a factory and uses that same factory to create two Name objects. If we run it:

$ python test.py
Ricardo Castro
Pedro Teixeira

This test code has no idea which class is dealing with each name string. And it doesn’t have to! The only thing that this code needs is a factory that gives back a Name object that it can use.

Note: The code presented here is based on the Java code available in Design Patterns Java Companion, by James W. Cooper. For a more extensive analysis of this concept you can read that or any book about Design Patterns that, of course, details the Factory Method design pattern.