Playing with loggers

Shall we spend some time exploring a little bit about loggers? We shall! Let’s do it.

Visit the docs for more detailed information about the logging module. Let’s use a simple example, from the documentation, to illustrate the basic usage:

def simple_exmaple():
    # create logger
    logger = logging.getLogger('StreamHandler')
    logger.setLevel(logging.DEBUG)

    # create console handler and set level to debug
    ch = logging.StreamHandler()

    # create formatter
    formatter = logging.Formatter('%(asctime)s - %(name)s - %(levelname)s - %(message)s')

    # add formatter to ch
    ch.setFormatter(formatter)

    # add ch to logger
    logger.addHandler(ch)

    # 'application' code
    logger.debug('debug message')
    logger.info('info message')
    logger.warn('warn message')
    logger.error('error message')
    logger.critical('critical message')

Initially we’re creating a new logger and setting it’s level to DEBUG. You can check the log levels here (with this level we can use debug and above levels). In the next step we create a handler, determining where we want to log to. In this case StreamHandler  will log to the console. Next we setup a formatter for our output and add it to our handler.  Our handler is ready so we add it to our logger. At last, we take out logger for a test run.

What if we want to log to a file? Couldn’t be easier:

def with_file_handler():
    # create logger
    logger = logging.getLogger('FileHandler')
    logger.setLevel(logging.DEBUG)

    # create console handler
    fh = logging.FileHandler('with_file_handler.log')

    # create formatter
    formatter = logging.Formatter('%(asctime)s - %(name)s - %(levelname)s - %(message)s')

    # add formatter to fh
    fh.setFormatter(formatter)

    # add ch to logger
    logger.addHandler(fh)

    # 'application' code
    logger.debug('debug message')
    logger.info('info message')
    logger.warn('warn message')
    logger.error('error message')
    logger.critical('critical message')

Almost exactly the same, we only change the handler to a FileHandler and specify the log file name.

What if we want to log to both the console and a file? You can either use two loggers or add two handlers to the same logger. Let’s see how to accomplish the latter:

def with_both():
    # create logger
    logger = logging.getLogger('Both')
    logger.setLevel(logging.DEBUG)

    # create console handler and set level to debug
    ch = logging.StreamHandler()

    # create console handler and set level to debug
    fh = logging.FileHandler('with_both.log')    

    # create formatter
    formatter = logging.Formatter('%(asctime)s - %(name)s - %(levelname)s - %(message)s')

    # add formatter to ch
    ch.setFormatter(formatter)
    fh.setFormatter(formatter)

    # add ch to logger
    logger.addHandler(ch)
    logger.addHandler(fh)

    # 'application' code
    logger.debug('debug message')
    logger.info('info message')
    logger.warn('warn message')
    logger.error('error message')
    logger.critical('critical message')

It’s just a combination of the two previous examples. You can even a little bit further and use the root logger:

def with_root_logger():

    # create console handler and set level to debug
    ch = logging.StreamHandler()
    
    # create console handler and set level to debug
    fh = logging.FileHandler('with_root_logger.log')
    
    # create formatter
    formatter = logging.Formatter('%(asctime)s - %(name)s - %(levelname)s - %(message)s')

    # add formatter to ch
    ch.setFormatter(formatter)
    fh.setFormatter(formatter)

    # add ch to logger
    logging.getLogger().addHandler(ch)
    logging.getLogger().addHandler(fh)
    logging.getLogger().setLevel(logging.DEBUG)

    # 'application' code
    logging.debug('debug message')
    logging.info('info message')
    logging.warn('warn message')
    logging.error('error message')
    logging.critical('critical message')

Easy isn’t it? I hope this gives you a quick intro into the Python’s logging module. Don’t forget to visit the docs .

Advertisements

The bash bug

If you’ve following the news, here for example, you’re aware that there is a new bug out there. You can easily find information about it out there and how to fix it.

We’ve patching servers and although we have the most recent one’s managed with Chef some legacy one’s are not. As a good practice I script everything, so this time wasn’t an exception.

The script provided here as a gist, will help you check for the bug and patch it.

Because our servers are mostly Ubuntu servers, it’s only accounting for that. But you can easily change the script to suite your system.

Just a quick rundown of what it does.

  • it ssh’s into your server one by one and runs a test;
  • if the output of the test contains ‘vulnerable’ well, it’s vulnerable;
  • it then updates the repository and updates your bash.

For this script, I’m using Fabric. You can install it on your system, you you can create a virtualenv for the purpose. You can do:

$ virtualenv /path/to/env/folder
$ source /path/to/env/folder/bin/activate
$ pip install fabric

After that, get the code into any folder you desire (remember to name the file fabfile.py) and run:

$ fab check_bug

I hope it helps.

P.S. Of course, don’t forget yo update the hosts, user and key_filename to your own. Also, a check_bug.log is created in the same folder the file is run from. You can use that log to troubleshoot any problem that might arise.

 

Why not try Python 3?

So, you’ve been using Python 2 since forever right? Well, Python 2 is still strong but you will, eventually, have to move on. The will be no Python 2.8.

Python 3 is currently on version 3.4.1 and all of  us should at least try it out. Or maybe you want to try some other “Python flavor”, like PyPy for example. Virtualenv will help us.

Sure, you might be working professionally with Python 2 and you still want that to be your default. No worries. If you’re working with Python and not using virtualenv, well… You should use it! Even if you always use the same Python version, you should use it (I will not get tired of saying this). But let’s leave the discussion about using virtualenv for some other time and just accept, for now, that you should use it.

First thing, head to the downloads section and download the latest Python version. I’m writing this on a Mac, so I’ll get the OS X version. After the installation, check, on the command line that Python 3.4 is available:

 $ python3.4 -c “import sys; print(sys.version)”

3.4.1 (v3.4.1:c0e311e010fc, May 18 2014, 00:54:21)

[GCC 4.2.1 (Apple Inc. build 5666) (dot 3)]

Now we will create a virtual environment that will have Python 3.4 as it’s interpreter. We can achieve that by using the “-p”  parameter. But first, let’s locate the path to of your “new” Python:

$ which python3.4

/Library/Frameworks/Python.framework/Versions/3.4/bin/python3.4

Now that we know the path to Python 3.4, we can create our environment:

$ virtualenv -p /Library/Frameworks/Python.framework/Versions/3.4/bin/python3.4 /path/to/the/env

Running virtualenv with interpreter /Library/Frameworks/Python.framework/Versions/3.4/bin/python3.4

Using base prefix ‘/Library/Frameworks/Python.framework/Versions/3.4’

New python executable in /Users/rcastro/.envs/test_python3.4/bin/python3.4

Also creating executable in /Users/rcastro/.envs/test_python3.4/bin/python

Installing setuptools, pip…done.

Let’s activate our new environment and check that Python 3.4 is our default:

$ source ~/.envs/test_python3.4/bin/activate

(test_python3.4)$ python -c “import sys; print (sys.version)”

3.4.1 (v3.4.1:c0e311e010fc, May 18 2014, 00:54:21)

[GCC 4.2.1 (Apple Inc. build 5666) (dot 3)]

Excellent. This way you can even work with different versions on different projects. Cool, uh?

Merge Sort: let’s sort!

Sorting has always been a popular subject in Computer Science. Back in 1945 Mr. John von Neumann came up with Merge Sort. It’s an efficient divide and conquer algorithm, and we’ll dive right into it.

The general flow of the algorithm comes as follows: (1) divide the list into lists of size 1 (being the size of the original list), (2) recursively merge them back together to produce one sorted list.

As usual, I always get things better with an example. Let’s transform this (kind of) abstract explanation into a concrete example. We’ll use a simple, yet descriptive, example for sorting lists of integers. Let’s start with (1).

def merge_sort(list):

    if len(list) <= 1:
        return list

    return merge(merge_sort(list[:len(list)/2]), merge_sort(list[len(list)/2:]))

Here we have part 1. Basically what we’re doing here is to check  at each step if our list is already of size 1 and ff it is  we return it. If not, we split it in half, call merge_sort on each of them and call a function merge with both halves. How many times will this merge function be called? log n because we’re splitting the list in half at each time.

Next, phase number (2). We need to merge everything back together.

def merge(l1, l2):
    result = []
    i, j = 0, 0

    while i < len(l1) and j < len(l2):         
        if l1[i] > l2[j]:
            result.append(l2[j])
            j += 1
        else:
            result.append(l1[i])
            i += 1

	result += l1[i:]
	result += l2[j:]

	return result

So, what’s going on here? We know that we start with lists of size 1. That means that at each step, each of the 2 lists will be sorted on it’s own. We just need to stitch them together. That means that we go through the lists (until we reach the end of at least one) and we get  the smallest element at each step. When one of them  ends, we just need to add the remaining elements of the other to the result.

We already know that this merge will be called log n times. But at each call merge does comparisons because it needs to figure out where the all the elements fit together. So Merge Sort is a O(n log n) comparison sorting algorithm

Concurrent vs Parallel

In a world where we hear and talk a lot about making code run concurrent or in parallel, there’s sometimes a little bit of confusion between the two. It happens that many times we use one term when referring to the other or even use them indistinguishably. Let’s shed some light on the matter.

When we say that we have concurrency in our code, that means that we have tasks running in periods of time that overlap. That doesn’t mean that they run at the exact same time. When we have parallel tasks that means that they they run at the same time.

In a multi-core world it might seem that concurrency doesn’t make sense, but as everything, we should the right approach for the job at hand. Imagine for example a very simple web application where one thread handles requests and another one handles database queries: they can run concurrently. Parallelism  has become very useful in recent times in the Big Data era, where we need to process huge amounts of data.

Let’s see an example of each, run and compare run times.

Concurrent:

from threading import Thread

LIMIT = 50000000

def cycle(n):	
    while n < LIMIT:
	n += 1

t1 = Thread(target=cycle,args=(LIMIT/2,))
t2 = Thread(target=cycle,args=(LIMIT/2,))
t1.start()
t2.start()
t1.join()
t2.join()

Parallel:

from multiprocessing import Process

LIMIT = 50000000

def cycle(n):	
    while n < LIMIT:
	n += 1

p1 = Process(target=cycle, args=(LIMIT/2,))
p2 = Process(target=cycle, args=(LIMIT/2,))
p1.start()
p2.start()
p2.join()
p2.join()

Now, the times to run:

$ time python concurrent.py

real0m4.174s

user0m3.729s

sys0m2.272s

$ time python parallel.py

real0m1.764s

user0m3.422s

sys0m0.027s

As we can see, the parallel code runs much faster than the concurrent. Which accordingly to what was said previously makes sense,doesn’t it? In this example, we can only gain time if the tasks run simultaneously.

Your programming language of choice will give the tools needed to implement both the approaches. Analyze you problem, devise a strategy and start coding!

P.S. Please note, that an imperative implementation would run faster than the concurrent one due to the Python’s GIL.

Django and Jenkins

If you’ve read (and followed) two of my previous posts, A small help to get you into Continuous Integration and Let’s link Jenkins and Github together, by now you have a Jenkins server linked to a Github repository. While those two posts were a little bit more generic, this one will focus on building Django projects. Let’s call it Part 3 of this series.

Building a Django project in a CI environment involves several steps. From installing all dependencies (virtualenv is a must) , rebuilding you database (you should always be in the position where you can make a deploy from scratch and that involves,of course, rebuilding you database), run tests, generate reports, etc. Please read this excellent article  about Continuous Integration from Martin Fowler. It’s worth your time!

As you can see there are a lot of steps involved so it would be best if we script it all once and use many times, wouldn’t it? We’ll with Django that’s even simpler because django-jenkins  allow’s “Plug and play continuous integration with Django and Jenkin “. Sweet! Let’s add the following packages to our requirements file:

  • django-jenkins
  • coverage – code coverage measurement from Python
  • pylint – Python code static checker

Let’s update our settings file with the following settings:

INSTALLED_APPS += (
    'django_jenkins',
)

JENKINS_TASKS = (
    'django_jenkins.tasks.run_pylint',
    'django_jenkins.tasks.with_coverage',
)

PROJECT_APPS=(
    'demo_app',
)

Armed with these tools, django-jenkins “knows” what to do. It knows how to run tests and how to generate reports. PROJECT_APPS will tell Jenkins only to build reports to our apps, excluding Django own code reports. What we need now is to tell Jenkins what to do. Let’s do that.

First thing we nee to do is install the required plugins: Violations for parsing the pylint reports and Cobertura to get the code coverage reports. As we’ve seen in the previous posts, that’s done via the Manage Jenkins -> Manage plugins -> Available.

Next steps will involve pooling the Github repository and adding a build step. Click Configure and on Pool SCM let’s make it poll every ten minutes (cronjob syntax). On the Build section, select Execute shell and will add a shell script to automate the process.

8

Next step: build script. Add this script to the text area:

#!/usr/bin/env bash

virtualenv ve
source ./ve/bin/activate
pip install -r requirements.txt
python manage.py syncdb
python manage.py jenkins

Let’s break down this script into steps:

  • first, we create the environment to install all our dependencies;
  • next we install all dependencies from our requirements file;
  • following, we build our database. In this example we simply sync our models;
  • at last we  run django-jenkins.

This last step will generate the reports. We now need to tell Jenkins where they live so that they can be parsed: test results,  test coverage reports and pylint reports. Again in Configure, go to Add post-build-action and select:

  • Publish JUnit test  result report
  • Report Violations
  • Publish Cobertura coverage report

When django-jenkins runs, it creates a reports folder where reports are generated into. We just need tell Jenkins to find the required reports there.

9

Now, every  10 minutes Jenkins will poll Github and if there are changes, it will build and generate reports.

10

The evolution in the graphs are the result of several builds. Please note, that if your app has no tests the build will always fail.

Now you’re ready to go. CI world is at your feet. Conquer it!

Stacks and Queues: containers for all!

Stacks and Queues are two types of containers and as the name says, they’re used to store content. Predictable, uh? So what’s the difference between them, you might ask. Well Sir (or Madam), it’s the way data is retrieved.

Stacks support what we call LIFO (Last In, First Out). Elements are inserted at the top/end of the container, usually called push and retrieved from the same position, usually called pop.

stack

Let’s see how we could implement this in Python:

class Stack:
    """ Simple stack implementation. """
    def __init__(self):
        self.stack = []

    def push(self, elem):
        """ Add an element to the stack. """
        self.stack.append(elem)

    def pop(self):
        """ Remove element from stack. """
        self.stack.pop()

    def get_stack(self):
        """ Get current stack. """
        return self.stack

This generic implementation is simple but serves as an example of how we could implement a Stack.

Then we have Queues which are similar, but support what we call FIFO (Fast In, First Out). Elements are inserted at the bottom/end of the container, usually called enqueue and retrieved from the first position, usually called dequeue.

queue

Let’s see how we could implement this. Yes, that’s right, in Python:

class Queue:
    """ Simple queue implementation. """
    def __init__(self):
        self.queue = []

    def enqueue(self, elem):
        """ Add an element to the queue. """
        self.queue.append(elem)

    def dequeue(self):
        """ Remove element from queue. """
        return self.queue.pop(0)

    def get_queue(self):
        """ Get current queue. """
        return self.queue

Similar implementations, but as expected a different way of retrieving the data. As we can see, both containers can be efficiently implemented using lists/arrays. Also, we made use of Python’s append and pop methods for lists in order to insert and retrieve elements. Why reinvent the wheel?

Please note that theses data structures accept any kind of valid data simultaneously (integers, floats, arrays, strings, etc).