Python's multiprocessing module is a great tool that abstracts the details of forking and managing child processes in an interface inspired by the threading module. The benefit to using processes over threads is that you effectively avoid the issues of the GIL (Global Interpreter Lock).

I wanted to share my experience with sharing static data between the parent and the forked children. The solution I ultimately went with is trivially implemented and works well. It takes advantage of the fact that the children import the same modules of the parent. If you house your data in a shared module, it's accessible in both places.

The directory structure looks like this:

mypackage/
    __init__.py
    mp.py
    myglobals.py
myscript.py

Here's my light wrapper around the multiprocessing module, mp.py:

import multiprocessing

import MySQLdb

import myglobals

# handles each unit of work, in this case a SQL query
def worker_do(sql):
    myglobals.cursor.execute(sql)

# called once upon worker initialization
def worker_init():
    myglobals.conn = MySQLdb.connect(**myglobals.config['db'])
    myglobals.cursor = myglobals.conn.cursor()
    myglobals.cursor.execute('SET AUTOCOMMIT=1')

# wrapper for multiprocessing module
def do_work(queue, num_processes):
    pool = multiprocessing.Pool(num_processes, initializer=worker_init)
    pool.map(worker_do, queue, 1)
    pool.close()
    pool.join()

And here's my example script, myscript.py:

import os
import sys

import mp
import myglobals

def main():
   # anything in the myglobals module will be accessible by the child processes
   # we could then programatically retrieve this config info from a file
   # via ConfigParser
   #
   # for simplicity I hard-coded it here
   myglobals.config = {
      'db': {
         'host': 'db1',
         'user': 'dbuser',
         'passwd': 'dbpasswd',
         'db': 'dbase'
      }
   }

   # build a whole bunch of queries to perform via the workers
   queries = build_queries()

   # perform the multiprocessing operation
   mp.do_work(queries, 4)

   return 0

if __name__ == '__main__':
   sys.exit(main())

In this example the benefit would be to keep your database configuration code DRY - and share that data with the child processes.

Related posts:

  1. Python libwkhtmltox module – wrapping a C library using Cython – convert HTML to PDF
  2. Python’s Tornado has swept me off my feet
  3. Setup Python 2.5, mod_wsgi, and Django 1.0 on CentOS 5 (cPanel)
  4. Deployment Using Capistrano / Webistrano via Rails / Phusion Passenger
  5. Suppressing SQL Logs in Rails