04 Mar
by: Matt in Development, Python
tags: multiprocessing, Python
Python's multiprocessing module is a great tool that abstracts the details of forking and managing child processes in an interface inspired by the threading module. The benefit to using processes over threads is that you effectively avoid the issues of the GIL (Global Interpreter Lock).
I wanted to share my experience with sharing static data between the parent and the forked children. The solution I ultimately went with is trivially implemented and works well. It takes advantage of the fact that the children import the same modules of the parent. If you house your data in a shared module, it's accessible in both places.
The directory structure looks like this:
mypackage/
__init__.py
mp.py
myglobals.py
myscript.py
Here's my light wrapper around the multiprocessing module, mp.py:
import multiprocessing
import MySQLdb
import myglobals
# handles each unit of work, in this case a SQL query
def worker_do(sql):
myglobals.cursor.execute(sql)
# called once upon worker initialization
def worker_init():
myglobals.conn = MySQLdb.connect(**myglobals.config['db'])
myglobals.cursor = myglobals.conn.cursor()
myglobals.cursor.execute('SET AUTOCOMMIT=1')
# wrapper for multiprocessing module
def do_work(queue, num_processes):
pool = multiprocessing.Pool(num_processes, initializer=worker_init)
pool.map(worker_do, queue, 1)
pool.close()
pool.join()
And here's my example script, myscript.py:
import os
import sys
import mp
import myglobals
def main():
# anything in the myglobals module will be accessible by the child processes
# we could then programatically retrieve this config info from a file
# via ConfigParser
#
# for simplicity I hard-coded it here
myglobals.config = {
'db': {
'host': 'db1',
'user': 'dbuser',
'passwd': 'dbpasswd',
'db': 'dbase'
}
}
# build a whole bunch of queries to perform via the workers
queries = build_queries()
# perform the multiprocessing operation
mp.do_work(queries, 4)
return 0
if __name__ == '__main__':
sys.exit(main())
In this example the benefit would be to keep your database configuration code DRY - and share that data with the child processes.
For one off scripts for a particular project:
#!/usr/bin/env python from django.core.management import setup_environ from myapp import settings setup_environ(settings) # do some stuff
02 Mar
by: Matt in Development, PHP
tags: concurrency, fork, parallel, pcntl, pcntl_fork, pcntl_wait, php, process, thread, unix
I find it interesting and challenging to bend PHP in ways it probably shouldn't be bent. Almost always I walk away pleasantly surprised at it's ability to solve a variety of problems.
Consider this example. Let's say you want to take advantage of more than one core for a given process. Perhaps it performs many intensive computations and on a single core would take an hour to run. Since a PHP process is single threaded you won't optimally take advantage of the available multi-core resources you may have.
Fortunately, via the Process Control (PCNTL) extension, PHP provides a way to fork new child processes. Forking is the concept of duplicating a thread of execution from the parent to a new child. pcntl_fork() is the function that does this.
The framework for using this extension is as follows:
$maxChildren = 4;
$numChildren = 0;
foreach($unitsOfWork as $unit) {
$pids[$numChildren] = pcntl_fork();
if(!$pids[$numChildren]) {
// do work
doWork($unit);
posix_kill(getmypid(), 9);
} else {
$numChildren++;
if($numChildren == $maxChildren) {
pcntl_wait($status);
$numChildren--;
}
}
}
When a new child is forked via pcntl_fork() the pid is returned. The if statement following the fork allows the child and parent to split their flow of execution based on who they are (i.e. the child does the work and kills itself - the parent tests for hitting the max number of children and waits, otherwise it creates another child). The pcntl_wait() function is called when we hit $maxChildren, it blocks until a child exits.
Remember, if you want use database connections in your children, they each need to initialize their own connection. Resources such as database connections are not thread safe.
Just a quick note alerting everyone to the fact that jQuery has gotten EVEN EASIER AND FASTER.
Go check out the release notes.
08 Feb
by: Matt in Development, Django, Infrastructure, PHP, Python, Ruby, Ruby on Rails
tags: deployment, Django, mod_wsgi, mongrel, passenger, php, phusion passenger, Python, rails, ruby, subversion
I finally got around to setting up a more sophisticated deployment system for some of my apps. These apps include some built on a custom PHP framework and others that are Python / Django apps. I figured I'd share my experience...
Why is a high-level deployment infrastructure important? Deployment is something that should be simple, accessible, and repeatable. It should be as close to a "single click" as possible. Previously, for me, it was a bash script that exported some SVN branches. While this worked fine, as projects progress, you want some accountability, history, and the ability to roll back mission critical applications when something goes wrong with a deploy.
Capistrano is an open source, command line, deployment tool that provides all of these features. It's written in Ruby. You leverage a variety of built in "recipes" (Capistrano's term for a deployment script) that execute certain procedures to deploy an app. Out-of-the-box it's ideally built to deploy a Rails app. However, after some minor tweaks it can deploy most anything and do it well. It can restart servers, update symlinks, change permissions - pretty much anything. It assumes you access your POSIX compliant server via SSH via the same password (or have ssh keys setup).
Webistrano is an open source web front-end for Capistrano. It's a convenience layer that abstracts the command line away and provides an interface to perform the same tasks. This interface shows history as well as providing a convenient GUI for creating new deployment projects, stages, and recipes. Highly recommended.
Let's get down to business. This post makes a few assumptions about things you've already installed and used previously.
Well, this is an easy one (you probably want to do this as root):
gem install capistrano
Also fairly easy, with a little splash of configuration.
# wget http://labs.peritor.com/webistrano/attachment/wiki/Download/webistrano-1.4.zip
# unzip webistrano-1.4.zip
# mv webistrano-1.4 /path/to/where/you/want/webistrano
Setup the database tables and create a new webistrano user (obviously be conscious of your security preferences for access to your database in the host and password portions):
# mysql
mysql> CREATE DATABASE `webistrano`;
mysql> CREATE USER 'webistrano'@'localhost' IDENTIFIED BY 'password';
mysql> GRANT ALL PRIVILEGES ON `webistrano`.* TO 'webistrano'@'localhost' WITH GRANT OPTION;
Now, in the directory where you placed webistrano you're going to want to copy config/database.yml.sample to config/database.yml. Edit this file, in the production area, to match your database settings. By default the file expects a socket to connect, you can chase this by specifying host: and port:. (Keep in mind Webistrano is simply a Rails app).
You should now be able to have Rails migrate the new database you created. In the webistrano directory:
# RAILS_ENV=production rake db:migrate
Finally, copy config/webistrano_config.rb.sample to config/webistrano_config.rb and edit according to your preferred mail settings.
We can now test to see if webistrano is working properly by serving it via mongrel:
# ruby script/server -d -e production -p 3000
This starts a single mongrel daemon, using the production environment, listening on port 3000. You should now be able to hit http://127.0.0.1:3000/ and get the Webistrano login prompt. If this is working, kill that mongrel instance.
For longer term serving I decided to go with Phusion Passenger (essentially mod_rails for Apache). It's a nearly zero configuration solution for serving a rails app and will feel at home to anyone with experience serving PHP apps via Apache and mod_php.
Again, as root:
# gem install passenger
# passenger-install-apache2-module
The second command will invoke an installer which compiled Passenger and provides instructions on integrating it into your Apache config. Essentially, edit your httpd.conf as follows (these were specific to my install, make sure to use the ones provide by the installer for you):
LoadModule passenger_module /usr/lib/ruby/gems/1.8/gems/passenger-2.2.9/ext/apache2/mod_passenger.so
PassengerRoot /usr/lib/ruby/gems/1.8/gems/passenger-2.2.9
PassengerRuby /usr/bin/ruby
Now you can simply add VirtualHost entries to your httpd.conf for any of your Rails apps. Let's add one for Webistrano:
<VirtualHost *:80>
ServerName webistrano.mydomain.com
DocumentRoot /path/to/webistrano/public
</VirtualHost>
Yes, Passenger makes it that simple. Add configuration directives as needed for your environment.
Now Webistrano should be serving from the VirtualHost you specified, seamlessly, via Passenger.
Now the fun stuff.
Capistrano breaks things down into projects, stages, and recipes. Each app you want managed by capistrano should be it's own project. Each project should have a stage for at least production and optionally staging and development.
Hosts are added globally and form the targets of a deploy for any given project. Hosts can include web, app, and database servers.
Deployments in Capistrano are done to a child directory under "releases" named via the date and time of the deployment. By default 5 releases are kept and available to rollback to. Upon successful deployment a symlink (default is called "current" and can be modified via the current_path configuration variable) is updated to that release directory. It is this symlink that should be targeted by your webserver (your DocumentRoot in Apache).
Capistrano also creates a "shared" directory that is symlinked to in each release useful for storing logs and other data that should be maintained through each deployment.
For non-rails apps you'll use the "Pure File" project type when creating your new project. Upon project creation you can add configuration variables specific to your project. I recommend using :export instead of :checkout for deploy_via for production subversion deployments as this doesn't expose .svn directories. Use an SSH user that has enough permissions to create directories where your deploy will occur or, specify use_sudo to true and create a new configuration variable admin_runner and set it to the same user as runner.
Add a stage to your new project for "production". In the "Manage Hosts" page add a new host for each of your application servers. Then add each host as a target of your "production" stage of your project.
At this point you should be able to execute the "Setup" task for your "production" stage. This is a one time task that simply creates the directories.
Assuming this went successfully, try doing a "Deploy" and see if that finishes without error. You might have to play around with permissions and other minor issues - post a comment if you have any specific questions.
For my PHP framework there are a couple specific tasks I wanted to run in addition to the default Capistrano tasks. You do this by creating custom recipes in the "Manage Recipes" page in Webistrano. Recipes are simply procedures written in ruby. Here's what my recipe looks like:
namespace :deploy do
task :setup, :except => { :no_release => true } do
dirs = [deploy_to, releases_path, shared_path]
dirs += shared_children.map { |d| File.join(shared_path, d) }
run "#{try_sudo} mkdir -p #{dirs.join(' ')} && #{try_sudo} chmod g+w #{dirs.join(' ')}"
run "chmod 777 #{shared_path}/log"
end
task :finalize_update, :except => { :no_release => true } do
run "mkdir -p #{latest_release}/app/tmp"
run "chmod -R 777 #{latest_release}/app/tmp"
run "rm -rf #{latest_release}/app/logs"
run "ln -s #{shared_path}/log #{latest_release}/app/logs"
run "cp #{latest_release}/public_html/.htaccess-production #{latest_release}/public_html/.htaccess"
run "cp #{latest_release}/app/config/config-production.php #{latest_release}/app/config/config.php"
run "cp #{latest_release}/app/config/db-default.php #{latest_release}/app/config/db.php"
run "cp #{latest_release}/app/config/memcache-default.php #{latest_release}/app/config/memcache.php"
end
end
If you're not familiar with Ruby - what this code is essentially doing is overwriting two tasks in the :deploy namespace with my custom code.
The first, :setup, simply duplicates the base :setup functionality discussed above (creating the releases and shared directories) and chmods the shared log directory to be writable.
The second, :finalize_update, performs a variety of configuration tasks for a PHP app built with my framework. Also, you'll notice that I'm removing my app's logs directory and symlinking to the shared log directory. This way all releases will log to the same directory, consistently.
In my case all of these procedures are command line instructions. Alternatively, you can do a variety of things leveraging the full breadth of the Ruby language and any gem you'd like to introduce. Things such as accessing your CDN API to clear image, JS, or CSS caching, etc.
First off it's worth noting that I serve my Django apps via mod_wsgi. To make the deployment process easier here's what my app.wsgi script looks like:
import os import sys appdir = os.path.normpath(os.path.join(os.path.realpath(os.path.dirname(__file__)), '..')) sys.path.insert(0, appdir) os.environ['DJANGO_SETTINGS_MODULE'] = 'settings' os.environ['PYTHON_EGG_CACHE'] = os.path.join(appdir, '.python-eggs') import django.core.handlers.wsgi application = django.core.handlers.wsgi.WSGIHandler()
This code allows us to avoid having to hardcode paths in the wsgi script (and thus avoid having to change them when we deploy). It assumes the following directory structure:
.python-eggs (egg cache)
apps (apps path is added to python system path in settings.py)
public (where your .wsgi script resides)
site_media
templates
settings.py
settings-production.py (used for deploy)
urls.py
...
If you follow this convention, the following Capistrano recipe works great:
namespace :deploy do
task :setup, :except => { :no_release => true } do
dirs = [deploy_to, releases_path, shared_path]
dirs += shared_children.map { |d| File.join(shared_path, d) }
run "#{try_sudo} mkdir -p #{dirs.join(' ')} && #{try_sudo} chmod g+w #{dirs.join(' ')}"
run "chmod 777 #{shared_path}/log"
end
task :finalize_update, :except => { :no_release => true } do
run "rm -rf #{latest_release}/logs"
run "ln -s #{shared_path}/log #{latest_release}/logs"
run "cp #{latest_release}/settings-production.py #{latest_release}/settings.py"
run "mkdir -p #{latest_release}/.python-eggs"
run "chmod 777 #{latest_release}/.python-eggs"
end
end
This should give you a nice intro to leveraging Capistrano via Webistrano. Feel free to comment with questions, suggestions, or anything else!
Many (awesome) changes http://blog.jquery.com/2010/01/14/jquery-14-released/.
View the release notes here: http://jquery14.com/day-01/jquery-14
19 Dec
by: Matt in Development, JavaScript
tags: 1.4, JavaScript, jquery, jquery 1.4
Just wanted to mention that jQuery 1.4 Alpha 1 has been released.
Most of the changes seem to revolve around heavy optimization of some core functionality. Installing this alpha and testing in live applications will help get this release out!
30 Nov
by: Matt in Book Reviews, CSS, Clojure, Development, Django, Infrastructure, JavaScript, PHP, Python, Ruby, Ruby on Rails
tags: book review, clojure, Development, Django, php, programmer, Python, rails, ruby, Ruby on Rails
Send this to your significant other/parent/relative/friend so, instead of that sweater, you get one of these nuggets of awesome this Christmas.
Write better, cleaner, more maintainable code. Learn how to manage your projects and focus on shipping your product. With insight that covers the gamut of software development from low level to management
this one is a must have for anyone involved in this industry.
Highly recommended! Read my full review.
Another classic "software construction" book. Sharpen your saw with timeless information that can be applied to any project in any language. Less bugs, more productivity, more programmer happiness.
This one is different. Written as a set of interview transcripts with 15 legendary industry giants, this book is a fantastic insight into how some of the great minds think. It's inspiring to hear it from the source, must have!
A developer should learn at least one new language a year. This year that language should be Clojure. Clojure is a dynamic, general purpose, language targeting the Java virtual machine and designed for multi-threaded use. It's growing popularity, ability to leverage the Java standard library, and its multi-threaded nature make this a must have.
Another classic. Primarily discusses project management from the perspective of Fred Brooks and his experiences at IBM. Brooks' Law states that "adding manpower to a late software project makes it later".
Web developers should always keep in mind the user of the product their creating. Usability becomes increasingly important as applications move to the web. The design and usability of your app can make or break its success. This classic is a must read.
This classic known most commonly as the "gang of four" book is the definitive reference on design patterns. Covering all of the most common cases and time and time again serving as an invaluable source of information.
15 Nov
by: Matt in Development, Django, Infrastructure, Python
tags: centos, Django, linux, mod_wsgi, mysql, mysql-python, mysqldb, Python, setuptools
This is an update to my previous how-to Setup Python 2.5, mod_wsgi, and Django 1.0 on CentOS 5 (cPanel).
The biggest reason why I chose to go with Python 2.5 at the time was because the MySQL Python (MySQLdb) package didn't support Python 2.6. The 1.2.3c1 release does so that roadblock is lifted.
The instructions are identical - nothing has really changed in that regard. Just change the references from Python 2.5 to 2.6. Here are the links to the versions I'm using successfully:
Python 2.6.4: http://www.python.org/ftp/python/2.6.4/Python-2.6.4.tgz
setuptools 0.6c11: http://pypi.python.org/packages/2.6/s/setuptools/setuptools-0.6c11-py2.6.egg#md5=bfa92100bd772d5a213eedd356d64086
MySQLdb 1.2.3c1: http://sourceforge.net/projects/mysql-python/files/mysql-python-test/1.2.3c1/MySQL-python-1.2.3c1.tar.gz/download
mod_wsgi 2.6: http://modwsgi.googlecode.com/files/mod_wsgi-2.6.tar.gz
Django 1.1.1: http://www.djangoproject.com/download/1.1.1/tarball/
WP Cumulus Flash tag cloud by Roy Tanck and Luke Morton requires Flash Player 9 or better.