Generating a PowerPoint from text files

Annoy your colleagues with Python

One of my nerdy colleagues, @mattcaldwell, just gave a neat presentation at a Django-District Meetup on using vim+tmux as Django IDE. To keep it real, he created his presentation using vimdeck and actually presented his slides inside of vim.

I figured it would annoy him to no end to see his presentation in PowerPoint format. So here's a script to do it...

#!/usr/bin/python
# -*- coding: utf-8 -*-

import os
import sys
from cStringIO import StringIO
from argparse import ArgumentParser

from pptx import Presentation
from pptx.constants import MSO_SHAPE
from pptx.dml.color import RGBColor
from pptx.util import Inches, Pt


def is_md(src, f):
    """Returns true if file exists and has a Markdown extension"""
    return os.path.isfile(os.path.join(src, f)) and f.endswith('.md')


def get_pages(src):
    """Returns a list of Markdown filenames in the source directory."""
    return [os.path.join(src, f) for f in os.listdir(src) if is_md(src, f)]


def build_deck(pages):
    """Returns the actual PowerPoint deck."""
    pres = Presentation()
    blank_slidelayout = pres.slidelayouts[6]

    for page in pages:
        slide = pres.slides.add_slide(blank_slidelayout)

        left = top = Inches(0)
        width = Inches(10)
        height = Inches(5.63)

        shape = slide.shapes.add_shape(MSO_SHAPE.RECTANGLE, left, top, width,
                                       height)
        shape.fill.solid()
        shape.fill.fore_color.rgb = RGBColor(0, 0, 0)

        textbox = slide.shapes.add_textbox(left, top, width, height)
        textframe = textbox.textframe

        para = textframe.paragraphs[0]
        run = para.add_run()
        run.font.name = "Consolas"
        run.font.size = Pt(7)
        run.font.color.rgb = RGBColor(255, 255, 255)

        f = open(page)
        run.text = f.read()
        f.close()

    return pres


if __name__ == '__main__':
    parser = ArgumentParser(
        description='My dorky colleague created a presentation using vimdeck.\
                     The very last thing he would want to see is it as a real\
                     PowerPoint deck. So here you go...',
        epilog='This thing returns binary data, so you probably want to pipe\
                the output to a file with a .pptx extension. Oh, and be sure\
                to "pip install python-pptx==0.3.2" first.')
    parser.add_argument('source', type=str, nargs=1, help='Path to source\
        directory containing vimdeck markdown files.')
    args = parser.parse_args()

    source = os.path.realpath(args.source[0])
    deck = build_deck(get_pages(source))

    output = StringIO()
    deck.save(output)
    sys.stdout.write(output.getvalue())
    output.close()
    sys.exit()

Or grab the Gist. The only Python dependency is pptx. pip install python-pptx==0.3.2 will do just fine. For me, this was really just a fun exercise to gain a better understanding of the pptx library.


Python performance: string formatting with .format vs %

Which is faster?

Python 2.6 introduced the str.format() method for formatting strings which provides a much more flexible alternative to the older modulo (%) based string formatting. But which one performs better? Let's test it out by repeating a simple string format a million times with each method and timing the execution. [1]

str.format vs. (%) modulo

from timeit import timeit

def test_modulo():
    'Don\'t %s, I\'m the %s.' % ('worry', 'Doctor')

def test_format():
    'Don\'t {0}, I\'m the {1}.'.format('worry', 'Doctor')

timeit(stmt=test_modulo, number=1000000)
# 0.31221508979797363

timeit(stmt=test_format, number=1000000)
# 0.5489029884338379

Hmmm, the modulo operator is almost twice as fast as str.format() in these simple examples.

In cases where you don't require the flexibility of str.format() and simply want sheer perfomance... Module has got to be the way to go... But according to PEP-3101, str.format() is intended to replace the modulo operator... Let's hope that doesn't happen anytime soon.

Template strings

For completeness, Python also supports string templates (introduced in Python 2.4). Let's put that to the test.

from string import Template

s = Template('Don\'t $what, I\'m the $who.')

def test_template():
    s.substitute(what='worry', who='Doctor')

timeit(stmt=test_template, number=1000000)
# 6.010946035385132

Ouch!

[1]Tests were performed on a MacBook Pro (2.4 Ghz Intel Core i7) and Python 2.7.2. With Spotify playing the Melvins in the background.

Want Heroku Apache (PHP), but getting Node.js stack?

Forcing the Heroku PHP buildpack

I've been using Heroku to host a bunch of static sites using the Apache/PHP hack. All I need is an index.php file in the root of my application to hint to Heroku that I want the PHP (Apache) buildpack... and then I disable PHP altogether since I really just want Apache to serve a static site. But some of my recent attempts at creating these stacks have results in Heroku giving me a Node.js stack instead... WTF?!?

Turns out that some of the newer node-based tools for managing static assets require a "package.json" file, which makes Heroku think you want a Node.js stack.

Quick solution

Specify the buildpack when creating the app.

$ heroku create myapp --buildpack https://github.com/heroku/heroku-buildpack-php

Or specify the buildpack as a config parameter.

$ heroku create myapp
$ heroku config:set BUILDPACK_URL=https://github.com/heroku/heroku-buildpack-php

This will force Heroku to use the buildpack you want.

With the rapid evolution of front end tools for managing your Javascript and CSS assets, we're seeing Node-based tools crop up all over the place... In fact, Node Package Manager (npm) is quickly becoming the de facto standard for installing most new web tools. This is a great thing, however, we're really beginning to blur the lines between front and back end technologies. Since Heroku attempts to introspect an app to determine which stack to create, the existence of package.json creates confusion. And because the PHP stack even isn't officially supported, the Node stack wins. We just need to be more explicit these days.


Tomatohater.com now a Pelican/GitHub site

Statically generated for cms-less publishing

This site is now completely static, generated by Pelican and hosted on GitHub Pages. Ridiculously simple and elgant.

The evolution of tomatohater.com:

Dates                    Technology                     Hosting
-----------------------  -----------------------------  ---------------
Original (2008-2010)     django-basic-blog              Amazon EC2
May 2010 - Dec 2011      django-mingus                  Amazon EC2
Dec 2011 - Mar 2012      Wordpress (custom)             Heroku
Mar 2012 - Jun 2013      Wordpress (custom)             Bluehost
Jun 2013 -               Static, generated by Pelican   GitHub Pages

You can check out the source code at https://github.com/tomatohater/tomatohater.github.io/tree/source



I'm speaking at OSCON 2012

Join me at O'Reilly Media's Open Source Convention

I'll be heading to Portland, OR in July since my talk entitled "Faster! Faster! Accelerate your business with blazing prototypes" was accepted for OSCON 2012 (Open Source Convention).

The general idea is three fold:

  1. 1. Why do businesses still choose COTS solutions? I'll attempt to answer that question and provide some weaponry to help get past that unfortunate reality.
  2. 2. Then I'll reveal my survey results of a number of common development frameworks and assess their fitness for "rapid prototyping". We'll attempt to identify which tools can bring a concept in to real working code in the least amount of time and with the least amount of pain.
  3. 3. Finally, we'll walk through a simple prototyping example using one of these frameworks.

If you haven't already, get registered for the conference. Use code **OS12FOS** an get 20% off registration.

Hope to see you there!

I'm Speaking at OSCON 2012 (size 300 X 250)


Migrating a Django app to Heroku

Heroku supports Python! What are we waiting for?

As a Python and Django kind of guy, I had always been jealous of the Ruby on Rails folks. This has nothing at all to do with the framework itself. No, no, no.... Django all the way. It was the Heroku cloud application platform that had me longing.

Yes, I could run my Django application on Google App Engine, but that requires all sorts of hackery and my app ended up an abomination of the original... too unnatural for my tastes.

I sensed a small shift in the Earth's rotation on Sept 28, 2011. This is when Heroku added support for Python/Django on their platform. I needed to give it a test drive and I was amazed how simple it was. The following is an account of what I did to port one of my existing Django sites to Heroku...

Initial setup

Heroku already provides a decent quick start guide for Python. That's a great place to begin. Check out the Prerequisites and Local Workstation Setup which will get you up and running quickly. It helps if you're already familiar with Git, virtualenv and pip. If you're not, then now is an excellent time to learn!

First things first. Assuming your Django project is already in Git, change directory to the project root. Then...

$ heroku auth:login
... output omitted ...
$ heroku create --stack cedar
... output omitted ...

Believe it or not, we're almost there!

Database config

Now, my existing app is running a MySQL database. I'm going to use the built-in Postgres DB on Heroku. So I need to update my Django settings...

# Database config
DATABASES = {
  'default': {
    'ENGINE': 'django.db.backends.postgresql_psycopg2',
  }
}

Note, that I didn't supply any database info or credentials. Heroku auromatically injects this info into your settings.py file. I also need to dump out my current database which I can later import into my new app (this could also be done with fixtures).

$ python myapp/manage.py dumpdata > db.sql

I'm already storing my Python dependencies in a requirements.txt file. If you're not, you'll need to create this file at the project root.

$ pip freeze > requirements.txt
$ cat requirements.txt
Django==1.3
feedparser==5.1
gunicorn==0.12.2
lxml==2.3.3
psycopg2==2.4.2
python-dateutil==1.5
python-sunlightapi==1.1.0

Don't forget to commit your changes! Then push to Heroku.

$ git commit -a -m 'Mods to run on Heroku.'
$ git push heroku master

That second line above is really something to be admired. Not only does it push the code to Heroku's repository, but then it triggers a real deployment. Heroku automatically copies the files to the stack, installs all the dependencies (via requirements.txt), detects that this is a Django application, and runs the application with "runserver". Poof! Done.

Well, not quite done. But really that's bulk of the it and my app is in fact running. You can confirm this by using the "heroku ps" command.

Running your app

Now I need to do all the normal Django setup stuff, like syncdb and loaddata...

$ heroku run python myapp/manage.py syncdb
$ heroku run python myapp/manage.py loaddata < db.sql

If all goes well, I should be able to hit my site with a web browser at the wacky hostname provided by Heroku when I created the stack. Herou provides a shortcut...

$ heroku open

Final thoughts

We'll that was pretty darn easy. A few other things to note if you're trying this yourself.

  • You'll need to serve static media from somewhere. I used django.contrib.staticfiles in this example, but that's probably not idea for production. Though the output does get cached... so it's also not too bad.
  • You don't want to use the built-in Django runserver. I prefer gunicorn and it's easy to configure that!
  • Enjoy yourself. This is cool stuff!

Dear Congress...

A note to Congress regarding SOPA and PIPA

Dear Representative Van Hollen, Senator Cardin, and Senator Mikulski:

The “Stop Online Piracy Act” (H.R. 3261) and the “Preventing Real Online Threats to Economic Creativity and Theft of Intellectual Property Act of 2011″ (S.968) are intended to solve a worthy problem, yet the methods recommended by these bills I find to be completely offensive.

I greatly value the protection of intellectual property, yet I place the values of freedom and free speech even higher. We must not enact a bill with power to dismantle these very freedoms.

Please find another way. Punish the offender, not the messenger. I have confidence that you will find an acceptable alternative.

Please undertand very clearly that your vote for these bills in their current forms will be countered with a vote against you in your next election.

Sincerely,

Drew Engelson Cabin John, MD


Custom Django management commands on Heroku

Quick solution to a common Django/Heroku problem.

Running Django management commands is easy on Heroku. For example, to syncdb you simply execute:

$ heroku run python your_app/manage.py syncdb

Easy enough. But you may find that running a custom management command to be a little trickier. You might run into something like this:

$ heroku run python your_app/manage.py your_custom_command
Running python your_app/manage.py your_custom_command attached to terminal... up, run.2
Unknown command: 'your_custom_command'
Type 'manage.py help' for usage.

Ouch! And you wouldn't be alone here:

The solution

This really just comes down to Python not finding your app. You just need to adjust your Python path to include your home directory.

On Heroku, your home directory is generally "/app". You should confirm this by running:

$ heroku run env | grep HOME
HOME=/app

A simple way to adjust your Python path within your Heroku environment (and not mucking with your app) is by setting the PYTHONPATH env variable as follows:

$ heroku config:add PYTHONPATH=/app

To confirm it is set correctly, run:

$ heroku run env | grep PYTHONPATH
PYTHONPATH=/app

Now you can run your custom management command. This also allows you to run these as cron (scheduled) tasks:

$ heroku run python your_app/manage.py your_custom_command
Success!

I hope this post will save you some headbanging. Unless, of course, it's to the Melvins.


Why Django?

In 5 words or less… Why do you use Django?

I have been a heavy Django developer, architect, and evangelist since about 2006 when Nowell Strite and I first saw a presentation by Adrian and Jacob at OSCON. We brought Django back to PBS where it quickly became our standard development platform.

I know why I use Django...

I'm conducting some informal research for a project and I want to hear from you. Why you do use Django? Please leave a comment here or tweet me @handofdoom.

Try to limit your responses to 5 words or less.


Faetus: An FTP interface to Amazon S3 file storage.

An FTP interface to Amazon S3 file storage.

What?

Faetus is an FTP server that translates FTP commands into Amazon S3 API calls providing an FTP interface on top of Amazon S3 storage.

image1

Why?

Amazon's S3 API is awesome and there are plenty of excellent libraries that make this very simple. However, sometimes you don't have control over a system, and when that system knows how to talk FTP but not S3, Faetus is your solution. Read the blog post for more info.

Download

Get the source code from GitHub.

Known issues

  • Some FTP clients fail with a socket error when writing a file
  • Connections occasionally lost when in passive mode

Credit where credit is due

This project wouldn't have been possible without extensive use of pyftpdlib and the work of Chmouel Boudjnah's ftp-cloudfs from which Faetus heavily borrows. Thanks!


pyawschart

Python library for generating charts for Amazon Web Services based on Amazon CloudWatch data.

What?

pyawschart is a Python library for generating charts for Amazon Web Services based on Amazon CloudWatch data.

pyawschart-rds-writeops-example-1

Why?

Amazon's cloud services provide infrastructure metrics data via it's CloudWatch data API. This library aims to render this data more accessible by creating powerful (and pretty) data visualizations.

Download

Get the source code from GitHub.

Usage

import boto
from pyawschart import RANGES
from pyawschart.rds import CPUUtilizationChart

# Connect with boto
conn = boto.connect_cloudwatch(AWS_ACCESS_KEY AWS_SECRET_ACCCES_KEY)
chart = CPUUtilizationChart(conn, 'my-rds-instance', RANGES['hour'])

# Display chart url
print chart.get_url()

# Download chart image
chart.download('/tmp/my-rds-instance-cpu-hourly.png')

Known issues

Please see the GitHub project for known issues and to submit bugs and other feedback:

https://github.com/tomatohater/pyawschart/issues

Credit where credit is due

This project makes extensive use of boto and Python Google Chart. Thanks!


Page 1 / 4 »

About

I'm Drew Engelson, chief technologist at Celerity, a DC Metro-based technology consulting firm specializing in business acceleration. I seriously hate tomatoes.

Copyright © Drew Engelson