Migrating a Django app to Heroku

Heroku supports Python! What are we waiting for?

As a Python and Django kind of guy, I had always been jealous of the Ruby on Rails folks. This has nothing at all to do with the framework itself. No, no, no…. Django all the way. It was the Heroku cloud application platform that had me longing.

Yes, I could run my Django application on Google App Engine, but that requires all sorts of hackery and my app ended up an abomination of the original… too unnatural for my tastes.

I sensed a small shift in the Earth’s rotation on Sept 28, 2011. This is when Heroku added support for Python/Django on their platform. I needed to give it a test drive and I was amazed how simple it was. The following is an account of what I did to port one of my existing Django sites to Heroku…

Initial setup

Heroku already provides a decent quick start guide for Python. That’s a great place to begin. Check out the Prerequisites and Local Workstation Setup which will get you up and running quickly. It helps if you’re already familiar with Git, virtualenv and pip. If you’re not, then now is an excellent time to learn!

First things first. Assuming your Django project is already in Git, change directory to the project root. Then…

$ heroku login
... output omitted ...
$ heroku create --stack cedar
... output omitted ...

Believe it or not, we’re almost there!

Database config

Now, my existing app is running a MySQL database. I’m going to use the built-in Postgres DB on Heroku. So I need to update my Django settings…

# Database config
DATABASES = {
    'default': {
        'ENGINE': 'django.db.backends.postgresql_psycopg2',
    }
}

Note, that I didn’t supply any database info or credentials. Heroku auromatically injects this info into your settings.py file. I also need to dump out my current database which I can later import into my new app (this could also be done with fixtures).

$ python myapp/manage.py dumpdata > db.sql

I’m already storing my Python dependencies in a requirements.txt file. If you’re not, you’ll need to create this file at the project root.

$ pip freeze > requirements.txt
$ cat requirements.txt
Django==1.3
feedparser==5.1
gunicorn==0.12.2
lxml==2.3.3
psycopg2==2.4.2
python-dateutil==1.5
python-sunlightapi==1.1.0

Don’t forget to commit your changes! Then push to Heroku.

$ git commit -a -m 'Mods to run on Heroku.'
$ git push heroku master

That second line above is really something to be admired. Not only does it push the code to Heroku’s repository, but then it triggers a real deployment. Heroku automatically copies the files to the stack, installs all the dependencies (via requirements.txt), detects that this is a Django application, and runs the application with “runserver”. Poof! Done.

Well, not quite done. But really that’s bulk of the it and my app is in fact running. You can confirm this by using the “heroku ps” command.

Running your app

Now I need to do all the normal Django setup stuff, like syncdb and loaddata…

$ heroku run python myapp/manage.py syncdb
$ heroku run python myapp/manage.py loaddata < db.sql

If all goes well, I should be able to hit my site with a web browser at the wacky hostname provided by Heroku when I created the stack. Herou provides a shortcut…

$ heroku open

Final thoughts

We’ll that was pretty darn easy. A few other things to note if you’re trying this yourself.

  • You’ll need to serve static media from somewhere. I used django.contrib.staticfiles in this example, but that’s probably not idea for production. Though the output does get cached… so it’s also not too bad.
  • You don’t want to use the built-in Django runserver. I prefer gunicorn and it’s easy to configure that!
  • Enjoy yourself. This is cool stuff!

AWS Identity and Access Management (IAM) with Python

Flexible access control to AWS cloud services using Amazon IAM, Python, and boto

With all the AWS services that are now available, our opportunities in the cloud are virtually unlimited. But using any of these services requires access to your AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY and unfortunately, these keys provides complete access to the kingdom. This may not be a problem for some, but for large enterprises, granular access control is a necessity.

Up until recently, we would have been out of luck. But fortunately Amazon released Identity and Access Management (IAM) which makes flexible access control possible. And boto makes it easy in Python.

Take this example

Say your company uses Amazon S3 store your company’s image assets in a variety of S3 buckets. If you needed to grant a third party the ability to upload new images to your S3 account, they will need a set of keys. You wouldn’t want to give them your main keys since not only would they gain access to all of your S3 buckets, but also your EC2 instances, RDS databases, etc. Not a good situation.

Here’s where IAM comes in. With IAM, you can create a user for this specific purpose which would have it’s own unique AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY key pair. Additionally, you can apply an IAM policy to restrict what this user can do… a specific S3 bucket, in this case.

Let’s give it a try

In this example, we’ll create a user called melvins and grant it access to an S3 bucket called houdini.

import boto

# Connect to IAM with boto
iam = boto.connect_iam(AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY)

# Create user
user_response = iam.create_user('melvins')

# Limit access with IAM policy
policy_json = '''{
    "Statement":[{
        "Sid":"RandomStringIdentifier",
        "Action":"s3:*",
        "Effect":"Allow",
        "Resource":"arn:aws:s3:::houdini/*"
    }]
}'''
iam.put_user_policy('melvins', 'allow_access_houdini', policy_json)

# Generate new access key pair for 'melvins'
key_response = iam.create_access_key('melvins')

We now have a new set of AWS keys with access limited to a single S3 bucket. We could have just as easily further limited it to specific action (i.e. S3:PutObject), or to other services (i.e. EC2:*).

The objects we created with boto provide us with all sorts of nifty info.

>>> import simplejson as json
>>>
>>> user = user_response.create_user_response.create_user_result.user
>>> print json.dumps(user, indent=4)
{
    "path": "/",
    "create_date": "2011-01-22T20:02:27.900Z",
    "user_id": "AIDAXXXXXXXXXXXXXKXBW",
    "arn": "arn:aws:iam::46XXXXXXXX90:user/melvins",
    "user_name": "melvins"
}
>>>
>>> key = key_response.create_access_key_response.create_access_key_result.access_key
>>> print json.dumps(key, indent=4)
{
    "status": "Active",
    "user_name": "melvins",
    "create_date": "2011-01-22T20:16:17.189Z",
    "secret_access_key": "vDskXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX2cob",
    "access_key_id": "AKIAXXXXXXXXXXXXQ5DA"
}
>>>
>>> # Provide new keys to user
>>> print 'AWS_ACCESS_KEY_ID', key.access_key_id
AWS_ACCESS_KEY_ID AKIAXXXXXXXXXXXXQ5DA
>>>
>>> print 'AWS_SECRET_ACCESS_KEY', key.secret_access_key
AWS_SECRET_ACCESS_KEY vDskXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX2cob

Using IAM groups

Suppose your have three trusted employees (Buzzo, Lorax and Dale) who require full access to your AWS account. You could give them the master key pair. But what if Lorax decides to leave the company? You would have to change your master key pair, redistribute to the remaining employees, and hope there was nothing depending on the old ones. I would grow tired of this very quickly.

A better solution is to grant each employee his own key pair (as described above). But rather than managing their policies individually, you could add them to a group and apply a group policy.

import boto

# Connect to IAM with boto
iam = boto.connect_iam(AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY)

# Create group
iam.create_group('admins')

# Apply full access IAM policy
policy_json = '''{
   "Statement":[{
      "Effect":"Allow",
      "Action":"*",
      "Resource":"*"
      }
   ]
}
'''
iam.put_group_policy('admins', 'allow_all', policy_json)

# Create users
iam.create_user('buzzo')
iam.create_user('lorax')
iam.create_user('dale')

# Generate access keys
iam.create_access_key('buzzo')
iam.create_access_key('lorax')
iam.create_access_key('dale')

# Add users to group
iam.add_user_to_group('admins', 'buzzo')
iam.add_user_to_group('admins', 'lorax')
iam.add_user_to_group('admins', 'dale')

So when Lorax takes off, it’s just a simple matter.

iam.remove_user_from_group('admins', 'lorax')
iam.delete_user('lorax')

The moral of the story is: Quit distributing your master access keys. Use IAM. I doubt you want to learn this lesson the hard way!

Version note: boto added IAM support in 2.0b3.

Links

pyawschart – v0.2 released

Amazon CloudWatch data visualization

I have just pushed the source code for pyawschart – v0.2 out to GitHub. This project was created a few months back and I have been using it for personal (Proboscis) and professional (PBS) projects since then. I’ve just decided to open source it for the betterment of the AWS and Python communities.

The goal was quite simple… Use boto to pipe Amazon CloudWatch data into Python Google Chart and hope for some really cool data visualizations.

The results are pretty neat:
pyawschart-rds-writeops-example-1

Features

pyawschart – v0.2 supports all of the metrics that Amazon makes available for Elastic Compute Cloud (EC2), Relation Database Service (RDS), Elastic Block Store (EBS), and Elastic Load Balancer (ELB).

Example code

Get EC2 cpu utilization chart for the past hour.

import boto
from pyawschart import RANGES
from pyawschart.ec2 import CPUUtilizationChart

# Connect with boto
conn = boto.connect_cloudwatch(AWS_ACCESS_KEY AWS_SECRET_ACCCES_KEY)
chart = CPUUtilizationChart(conn, 'i-6d3efXXX', RANGES['hour'])

# Display chart url
print chart.get_url()

# Download chart image
chart.download('/tmp/i-6d3efXXX-cpu-hour.png')

Get RDS database connections chart for the past day.

import boto
from pyawschart import RANGES
from pyawschart.rds import DatabaseConnectionsChart

# Connect with boto
conn = boto.connect_cloudwatch(AWS_ACCESS_KEY AWS_SECRET_ACCCES_KEY)
chart = DatabaseConnectionsChart(conn, 'my-rds-instance', RANGES['day'])

# Display chart url
print chart.get_url()

# Download chart image
chart.download('/tmp/my-ec2-instance-connections-day.png')

Feedback

While the library works well for most of the use cases I require, I would welcome feedback on your use cases. Use it… Abuse it.. then submit comments on this entry, or post issues/requests on GitHub.

URLs

Private clouds for developers

A viable option?

At PBS we have been launching most new applications up in Amazon’s cloud platform utilizing EC2 instances, EBS and S3 storage, etc. The flexibility and agility that this Infrastructure as a Service (IaaS) is truly game changing, if not frightening (to IT departments).

The ease with which I am able spin up a server or 30 still gives me the chills even though I’ve been doing this for years by now. In fact, it is so easy that in addition to launching production applications in the cloud, we’ve been providing developers each their own dedicated EC2 servers for the applications they are building, modifying, or testing. The benefits of doing so are easy to see:

  • Development environment is identical in all aspects to production environment.
  • Deployment systems (such as Fabric) can be used to push to QA, staging and production servers as easily as to the developers instance.
  • Instances can be easily cloned if something goes wrong. Made a mistake on configuring the environment? Who cares? Blow it away and start over.

But is it too easy?

For each active application, we may have a dozen or so developers, each with their own EC2 instance. As more applications and developers come on board, you can foresee how we might end up with multiplicative growth in the number of development servers. Let’s take 10 developers each working on 3 projects… that’s 30 development instances for 3 projects. And that’s not counting QA, staging or production instances. Since a throw-away development instance costs just as much as a mission critical production instance… the cost balance feels out of whack to me.

Is there a way to bring down the cost of development instances while maintaining the above-mentioned benefits? What about hosting development servers in a private AWS compatible cloud, such as those fronted by Eucalyptus? There may be potential here and we are prototyping this out. But I suspect the underlying hardware required to run such a private cloud would be as costly (and less flexible) than using dedicated AWS EC2 instances.

Amazon’s new micro instances should provide some relief. Any other ideas?

Future of Faetus…

Should my FTP-to-S3 project evolve?

In my recent post entitled, Faetus v0.5 released!, I described how and why I built a Python-based FTP server that reads and writes to Amazon S3. This project was born out of necessity because I am using a 3rd-party file management application (which didn’t understand the S3 API) but needed the files to end up on S3. The system did talk FTP, so I wrote an FTP server that does just that.

As it turns out, the file manager DOES now support the S3 API. Therefore my immediate need for an intermediary FTP interface is gone. While I think it is still a very cool idea, I no longer have an urgent need for it. I will continue to fix some of the major bugs with it, but most of my tinkering will cease. Please let me know if you are interested in using Faetus and I’m happy to help you out if I can.