Just a quick note that the Tornado team announced the release of version 1.0 on July 22nd.
Here's the changelog.
Looks like some nice new features - I'm looking forward to upgrading.
01 Jul
by: Matt in Development, Infrastructure, Python
tags: API, asynchronous, non-blocking, Python, REST, tornado, web.py
I've been working with Python's Tornado for about 2 months now and I love it.
Tornado is a non-blocking web server written in Python. It's structure is similar to web.py so users of that popular Python web framework will feel right at home. This is a structure that lends itself really well to developing RESTful APIs as the methods you write to handle incoming requests are named after the HTTP methods used:
class PlaceHandler(tornado.web.RequestHandler):
def get(self, id):
# respond to a GET
self.write('GETting something')
def post(self):
# respond to a POST
self.write('POSTing something')
You match URI paths to "handlers" (the controller for those MVC folk) via a list of regex, handler tuples that instantiate an "application".
application = tornado.web.Application([
(r"/place", PlaceHandler),
(r"/place/([0-9]+)", PlaceHandler)
])
if __name__ == "__main__":
http_server = tornado.httpserver.HTTPServer(application)
http_server.listen(8888)
tornado.ioloop.IOLoop.instance().start()
As usual any values that are captured from the regex are passed, in order, to the method that receives the request in the handler.
Because of it's non-blocking nature Tornado bundles an asynchronous HTTP client for use internally. Additional modules include a command line and config file convenience library, escaping, 3rd party authentication (Facebook, Twitter, etc.), a wrapper around MySQLdb, and templating. All in all this makes it a formidable web framework in its own right, especially if you're looking for something that's light and FAST.
In production, I'm running 4 Tornado instances per server behind nginx.
One issue not addressed out of the box was daemonizing the Tornado instance. I added PID file management and the ability to daemonize as follows (pid.py module follows):
# capture stdout/err in logfile
log_file = 'tornado.%s.log' % options.port
log = open(os.path.join(settings.log_path, log_file), 'a+')
# check pidfile
pidfile_path = settings.PIDFILE_PATH % options.port
pid.check(pidfile_path)
# daemonize
daemon_context = daemon.DaemonContext(stdout=log, stderr=log, working_directory='.')
with daemon_context:
# write the pidfile
pid.write(pidfile_path)
# initialize the application
http_server = tornado.httpserver.HTTPServer(application.app)
http_server.listen(options.port, '127.0.0.1')
try:
# enter the Tornado IO loop
tornado.ioloop.IOLoop.instance().start()
finally:
# ensure we remove the pidfile
pid.remove(pidfile_path)
And now the pid.py module:
# pid.py - module to help manage PID files
import os
import logging
import fcntl
import errno
def check(path):
# try to read the pid from the pidfile
try:
logging.info("Checking pidfile '%s'", path)
pid = int(open(path).read().strip())
except IOError, (code, text):
pid = None
# re-raise if the error wasn't "No such file or directory"
if code != errno.ENOENT:
raise
# try to kill the process
try:
if pid is not None:
logging.info("Killing PID %s", pid)
os.kill(pid, 9)
except OSError, (code, text):
# re-raise if the error wasn't "No such process"
if code != errno.ESRCH:
raise
def write(path):
try:
pid = os.getpid()
pidfile = open(path, 'wb')
# get a non-blocking exclusive lock
fcntl.flock(pidfile.fileno(), fcntl.LOCK_EX | fcntl.LOCK_NB)
# clear out the file
pidfile.seek(0)
pidfile.truncate(0)
# write the pid
pidfile.write(str(pid))
logging.info("Writing PID %s to '%s'", pid, path)
except:
raise
finally:
try:
pidfile.close()
except:
pass
def remove(path):
try:
# make sure we delete our pidfile
logging.info("Removing pidfile '%s'", path)
os.unlink(path)
except:
pass
I'm going to follow up this post another on how I added a simple concept of "models" and an easy way to perform MySQL transactions. Let me know if you have any specific questions!
06 May
by: Eric in Ruby on Rails
tags: plugins, Ruby on Rails
Over the last few days I released my first open-source plugins on GitHub.
I hope someone finds these useful. Any and all feedback is appreciated.
28 Apr
by: Eric in Random, Ruby on Rails
tags: FormStack, JSON, Ruby on Rails, XML
I recently had to use the FormStack API in the context of a Rails app. You need to make these calls over SSL, and API returns either XML or JSON. I chose JSON because it's much easier to work with in my opinion and I hate XML.
Below is a simple example. Check out the FormStack API documentation for all the other API calls.
Often times you're tasked with solving a problem you haven't faced before, requiring the use of technologies you haven't previously been exposed to. This is a great thing! These experiences are the stuff of legend - continuing deep into the night as your curiosity peaks.
When delivery of the solution makes a difference to somebody's bottom line you have to balance the opportunity as a means to learn with your desire to deliver for a customer.
Consider this example. A client recently wanted a private chat system for internal company communications. The service they had been using wasn't meeting their needs, was littered with bugs, and sometimes didn't work at all. The core requirement other than privacy, real-time chat, presence, and multi-user chat was that it had to be compliant (all communications stored).
The learner in me wanted to dive deep, dig into XMPP, build a server from scratch, and accompany that with a web and desktop client. I spent a few days investigating the technologies involved and even wrote a quick proof of concept (that didn't use XMPP) in PHP.
What I came to realize is that much of the chat landscape had been "solved". There were rock solid open-source servers that were full-featured, standards compliant, extensible, performant, and scalable (I'm looking at you ejabberd). In addition, XMPP being such a universally accepted/supported protocol, there were open-source clients for every major OS and even an AJAX web client.
I really did want to write my own XMPP client and server. Perhaps I will some day, but only if it solves a problem the business is having that can't be solved through the use of existing tools. In my opinion this is a reminder to "keep your eye on the prize". If time and resources are infinite then by all means dig in. Since in business that's rarely (if ever) the case, it's a good lesson learned.
Ask yourself the question "are we in the business of compliant real-time chat?". If the answer is no take it off the shelf and solve the problem.
04 Mar
by: Matt in Development, Python
tags: multiprocessing, Python
Python's multiprocessing module is a great tool that abstracts the details of forking and managing child processes in an interface inspired by the threading module. The benefit to using processes over threads is that you effectively avoid the issues of the GIL (Global Interpreter Lock).
I wanted to share my experience with sharing static data between the parent and the forked children. The solution I ultimately went with is trivially implemented and works well. It takes advantage of the fact that the children import the same modules of the parent. If you house your data in a shared module, it's accessible in both places.
The directory structure looks like this:
mypackage/
__init__.py
mp.py
myglobals.py
myscript.py
Here's my light wrapper around the multiprocessing module, mp.py:
import multiprocessing
import MySQLdb
import myglobals
# handles each unit of work, in this case a SQL query
def worker_do(sql):
myglobals.cursor.execute(sql)
# called once upon worker initialization
def worker_init():
myglobals.conn = MySQLdb.connect(**myglobals.config['db'])
myglobals.cursor = myglobals.conn.cursor()
myglobals.cursor.execute('SET AUTOCOMMIT=1')
# wrapper for multiprocessing module
def do_work(queue, num_processes):
pool = multiprocessing.Pool(num_processes, initializer=worker_init)
pool.map(worker_do, queue, 1)
pool.close()
pool.join()
And here's my example script, myscript.py:
import os
import sys
import mp
import myglobals
def main():
# anything in the myglobals module will be accessible by the child processes
# we could then programatically retrieve this config info from a file
# via ConfigParser
#
# for simplicity I hard-coded it here
myglobals.config = {
'db': {
'host': 'db1',
'user': 'dbuser',
'passwd': 'dbpasswd',
'db': 'dbase'
}
}
# build a whole bunch of queries to perform via the workers
queries = build_queries()
# perform the multiprocessing operation
mp.do_work(queries, 4)
return 0
if __name__ == '__main__':
sys.exit(main())
In this example the benefit would be to keep your database configuration code DRY - and share that data with the child processes.
For one off scripts for a particular project:
#!/usr/bin/env python from django.core.management import setup_environ from myapp import settings setup_environ(settings) # do some stuff
02 Mar
by: Matt in Development, PHP
tags: concurrency, fork, parallel, pcntl, pcntl_fork, pcntl_wait, php, process, thread, unix
I find it interesting and challenging to bend PHP in ways it probably shouldn't be bent. Almost always I walk away pleasantly surprised at it's ability to solve a variety of problems.
Consider this example. Let's say you want to take advantage of more than one core for a given process. Perhaps it performs many intensive computations and on a single core would take an hour to run. Since a PHP process is single threaded you won't optimally take advantage of the available multi-core resources you may have.
Fortunately, via the Process Control (PCNTL) extension, PHP provides a way to fork new child processes. Forking is the concept of duplicating a thread of execution from the parent to a new child. pcntl_fork() is the function that does this.
The framework for using this extension is as follows:
$maxChildren = 4;
$numChildren = 0;
foreach($unitsOfWork as $unit) {
$pids[$numChildren] = pcntl_fork();
if(!$pids[$numChildren]) {
// do work
doWork($unit);
posix_kill(getmypid(), 9);
} else {
$numChildren++;
if($numChildren == $maxChildren) {
pcntl_wait($status);
$numChildren--;
}
}
}
When a new child is forked via pcntl_fork() the pid is returned. The if statement following the fork allows the child and parent to split their flow of execution based on who they are (i.e. the child does the work and kills itself - the parent tests for hitting the max number of children and waits, otherwise it creates another child). The pcntl_wait() function is called when we hit $maxChildren, it blocks until a child exits.
Remember, if you want use database connections in your children, they each need to initialize their own connection. Resources such as database connections are not thread safe.
Just a quick note alerting everyone to the fact that jQuery has gotten EVEN EASIER AND FASTER.
Go check out the release notes.
08 Feb
by: Matt in Development, Django, Infrastructure, PHP, Python, Ruby, Ruby on Rails
tags: deployment, Django, mod_wsgi, mongrel, passenger, php, phusion passenger, Python, rails, ruby, subversion
I finally got around to setting up a more sophisticated deployment system for some of my apps. These apps include some built on a custom PHP framework and others that are Python / Django apps. I figured I'd share my experience...
Why is a high-level deployment infrastructure important? Deployment is something that should be simple, accessible, and repeatable. It should be as close to a "single click" as possible. Previously, for me, it was a bash script that exported some SVN branches. While this worked fine, as projects progress, you want some accountability, history, and the ability to roll back mission critical applications when something goes wrong with a deploy.
Capistrano is an open source, command line, deployment tool that provides all of these features. It's written in Ruby. You leverage a variety of built in "recipes" (Capistrano's term for a deployment script) that execute certain procedures to deploy an app. Out-of-the-box it's ideally built to deploy a Rails app. However, after some minor tweaks it can deploy most anything and do it well. It can restart servers, update symlinks, change permissions - pretty much anything. It assumes you access your POSIX compliant server via SSH via the same password (or have ssh keys setup).
Webistrano is an open source web front-end for Capistrano. It's a convenience layer that abstracts the command line away and provides an interface to perform the same tasks. This interface shows history as well as providing a convenient GUI for creating new deployment projects, stages, and recipes. Highly recommended.
Let's get down to business. This post makes a few assumptions about things you've already installed and used previously.
Well, this is an easy one (you probably want to do this as root):
gem install capistrano
Also fairly easy, with a little splash of configuration.
# wget http://labs.peritor.com/webistrano/attachment/wiki/Download/webistrano-1.4.zip
# unzip webistrano-1.4.zip
# mv webistrano-1.4 /path/to/where/you/want/webistrano
Setup the database tables and create a new webistrano user (obviously be conscious of your security preferences for access to your database in the host and password portions):
# mysql
mysql> CREATE DATABASE `webistrano`;
mysql> CREATE USER 'webistrano'@'localhost' IDENTIFIED BY 'password';
mysql> GRANT ALL PRIVILEGES ON `webistrano`.* TO 'webistrano'@'localhost' WITH GRANT OPTION;
Now, in the directory where you placed webistrano you're going to want to copy config/database.yml.sample to config/database.yml. Edit this file, in the production area, to match your database settings. By default the file expects a socket to connect, you can chase this by specifying host: and port:. (Keep in mind Webistrano is simply a Rails app).
You should now be able to have Rails migrate the new database you created. In the webistrano directory:
# RAILS_ENV=production rake db:migrate
Finally, copy config/webistrano_config.rb.sample to config/webistrano_config.rb and edit according to your preferred mail settings.
We can now test to see if webistrano is working properly by serving it via mongrel:
# ruby script/server -d -e production -p 3000
This starts a single mongrel daemon, using the production environment, listening on port 3000. You should now be able to hit http://127.0.0.1:3000/ and get the Webistrano login prompt. If this is working, kill that mongrel instance.
For longer term serving I decided to go with Phusion Passenger (essentially mod_rails for Apache). It's a nearly zero configuration solution for serving a rails app and will feel at home to anyone with experience serving PHP apps via Apache and mod_php.
Again, as root:
# gem install passenger
# passenger-install-apache2-module
The second command will invoke an installer which compiled Passenger and provides instructions on integrating it into your Apache config. Essentially, edit your httpd.conf as follows (these were specific to my install, make sure to use the ones provide by the installer for you):
LoadModule passenger_module /usr/lib/ruby/gems/1.8/gems/passenger-2.2.9/ext/apache2/mod_passenger.so
PassengerRoot /usr/lib/ruby/gems/1.8/gems/passenger-2.2.9
PassengerRuby /usr/bin/ruby
Now you can simply add VirtualHost entries to your httpd.conf for any of your Rails apps. Let's add one for Webistrano:
<VirtualHost *:80>
ServerName webistrano.mydomain.com
DocumentRoot /path/to/webistrano/public
</VirtualHost>
Yes, Passenger makes it that simple. Add configuration directives as needed for your environment.
Now Webistrano should be serving from the VirtualHost you specified, seamlessly, via Passenger.
Now the fun stuff.
Capistrano breaks things down into projects, stages, and recipes. Each app you want managed by capistrano should be it's own project. Each project should have a stage for at least production and optionally staging and development.
Hosts are added globally and form the targets of a deploy for any given project. Hosts can include web, app, and database servers.
Deployments in Capistrano are done to a child directory under "releases" named via the date and time of the deployment. By default 5 releases are kept and available to rollback to. Upon successful deployment a symlink (default is called "current" and can be modified via the current_path configuration variable) is updated to that release directory. It is this symlink that should be targeted by your webserver (your DocumentRoot in Apache).
Capistrano also creates a "shared" directory that is symlinked to in each release useful for storing logs and other data that should be maintained through each deployment.
For non-rails apps you'll use the "Pure File" project type when creating your new project. Upon project creation you can add configuration variables specific to your project. I recommend using :export instead of :checkout for deploy_via for production subversion deployments as this doesn't expose .svn directories. Use an SSH user that has enough permissions to create directories where your deploy will occur or, specify use_sudo to true and create a new configuration variable admin_runner and set it to the same user as runner.
Add a stage to your new project for "production". In the "Manage Hosts" page add a new host for each of your application servers. Then add each host as a target of your "production" stage of your project.
At this point you should be able to execute the "Setup" task for your "production" stage. This is a one time task that simply creates the directories.
Assuming this went successfully, try doing a "Deploy" and see if that finishes without error. You might have to play around with permissions and other minor issues - post a comment if you have any specific questions.
For my PHP framework there are a couple specific tasks I wanted to run in addition to the default Capistrano tasks. You do this by creating custom recipes in the "Manage Recipes" page in Webistrano. Recipes are simply procedures written in ruby. Here's what my recipe looks like:
namespace :deploy do
task :setup, :except => { :no_release => true } do
dirs = [deploy_to, releases_path, shared_path]
dirs += shared_children.map { |d| File.join(shared_path, d) }
run "#{try_sudo} mkdir -p #{dirs.join(' ')} && #{try_sudo} chmod g+w #{dirs.join(' ')}"
run "chmod 777 #{shared_path}/log"
end
task :finalize_update, :except => { :no_release => true } do
run "mkdir -p #{latest_release}/app/tmp"
run "chmod -R 777 #{latest_release}/app/tmp"
run "rm -rf #{latest_release}/app/logs"
run "ln -s #{shared_path}/log #{latest_release}/app/logs"
run "cp #{latest_release}/public_html/.htaccess-production #{latest_release}/public_html/.htaccess"
run "cp #{latest_release}/app/config/config-production.php #{latest_release}/app/config/config.php"
run "cp #{latest_release}/app/config/db-default.php #{latest_release}/app/config/db.php"
run "cp #{latest_release}/app/config/memcache-default.php #{latest_release}/app/config/memcache.php"
end
end
If you're not familiar with Ruby - what this code is essentially doing is overwriting two tasks in the :deploy namespace with my custom code.
The first, :setup, simply duplicates the base :setup functionality discussed above (creating the releases and shared directories) and chmods the shared log directory to be writable.
The second, :finalize_update, performs a variety of configuration tasks for a PHP app built with my framework. Also, you'll notice that I'm removing my app's logs directory and symlinking to the shared log directory. This way all releases will log to the same directory, consistently.
In my case all of these procedures are command line instructions. Alternatively, you can do a variety of things leveraging the full breadth of the Ruby language and any gem you'd like to introduce. Things such as accessing your CDN API to clear image, JS, or CSS caching, etc.
First off it's worth noting that I serve my Django apps via mod_wsgi. To make the deployment process easier here's what my app.wsgi script looks like:
import os import sys appdir = os.path.normpath(os.path.join(os.path.realpath(os.path.dirname(__file__)), '..')) sys.path.insert(0, appdir) os.environ['DJANGO_SETTINGS_MODULE'] = 'settings' os.environ['PYTHON_EGG_CACHE'] = os.path.join(appdir, '.python-eggs') import django.core.handlers.wsgi application = django.core.handlers.wsgi.WSGIHandler()
This code allows us to avoid having to hardcode paths in the wsgi script (and thus avoid having to change them when we deploy). It assumes the following directory structure:
.python-eggs (egg cache)
apps (apps path is added to python system path in settings.py)
public (where your .wsgi script resides)
site_media
templates
settings.py
settings-production.py (used for deploy)
urls.py
...
If you follow this convention, the following Capistrano recipe works great:
namespace :deploy do
task :setup, :except => { :no_release => true } do
dirs = [deploy_to, releases_path, shared_path]
dirs += shared_children.map { |d| File.join(shared_path, d) }
run "#{try_sudo} mkdir -p #{dirs.join(' ')} && #{try_sudo} chmod g+w #{dirs.join(' ')}"
run "chmod 777 #{shared_path}/log"
end
task :finalize_update, :except => { :no_release => true } do
run "rm -rf #{latest_release}/logs"
run "ln -s #{shared_path}/log #{latest_release}/logs"
run "cp #{latest_release}/settings-production.py #{latest_release}/settings.py"
run "mkdir -p #{latest_release}/.python-eggs"
run "chmod 777 #{latest_release}/.python-eggs"
end
end
This should give you a nice intro to leveraging Capistrano via Webistrano. Feel free to comment with questions, suggestions, or anything else!
WP Cumulus Flash tag cloud by Roy Tanck and Luke Morton requires Flash Player 9 or better.