English version (French version)

The other day I looked into asynchronous programming in Java and I ran into this article: Asynchronous processing in Java applications – leveraging those multi-cores

Awesome article to learn about parallel programming in Java.

But of course, my pet language being python, I looked into solution for parallel programming in python.

After a couple of search on the net I found parallelpython, a handy little library to easily do parallel programming in python.

So let's start again, what do we want to achieve with parallel programming ? Well the basic idea is that we want to do several tasks at the same time, in parallel.

Let's build a basic script which executes two time a functions which wait for 5 seconds before informing us it is done:

#!/usr/bin/python
 
"""
Non-asynchronous python example
"""
 
import time
 
def background_stuff(num):
  time.sleep(5) # wait 5 seconds
  return "%s I'm done" % num
 
if __name__ == "__main__":
    print "Start at:" , time.asctime(time.localtime())
 
    print "Start doing something"
    print background_stuff(1)
    print "Do something..."
    print " ... do something else..."
    print background_stuff(2)
    print "I'm done"
 
    print "End at:", time.asctime(time.localtime())

What we would like to see is something like:

Start at: <date and time>
Start doing something
Do something
 ... do something else...
1 I'm done
2 I'm done
End at: <date and time>

However what we see is:

Start at: Fri Aug 19 13:35:15 2011
Start doing something
1 I'm done
Do something...
 ... do something else...
2 I'm done
End at: Fri Aug 19 13:35:25 2011

So python waits that the first function finishes before moving along the code. This is the normal behaviour, everything is sequential and it took 10 seconds to run this program.

But we can do better than that and rewrite the program to use parallel programming.

First install the parallel-python library:

yum install python-pp

There is our new script:

#!/usr/bin/python
 
"""
Asynchronous python example
"""
 
import pp
import time
 
def background_stuff(num):
  time.sleep(5)
  return "%s I'm done" % num
 
if __name__ == "__main__":
    print "Start at:" , time.asctime(time.localtime())
    job_server = pp.Server()
 
    print "Start doing something"
 
    f1 = job_server.submit(background_stuff, (1,) , modules=('time',))
    f2= job_server.submit(background_stuff, (2,), modules=('time',))
    print "Do something..."
    print " ... do something else..."
 
    print f1()
    print f2()
 
 
    print "End at:", time.asctime(time.localtime())

The important lines are:

import pp

Load the parallel-python library

job_server = pp.Server()

Creates a server of task

f1 = job_server.submit(background_stuff, (1,) , modules=('time',))

Submit a task to the server.

The first argument is the name of the function, the second argument (has to be a tuple) are the arguments to this function and (here) the third argument is a tuple of all the module the function is allowed to load. Note that if the background_stuff had an "import time" the modules argument would not be necessary.

And now the output is:

Start at: Fri Aug 19 13:43:46 2011
Start doing something
Do something...
 ... do something else...
1 I'm done
2 I'm done
End at: Fri Aug 19 13:43:51 2011

As you can see this time, we were able to call twice the background_stuff function, do something else in the mean while, getting back the output from these two functions and it ran in 5 seconds!