Installing PIL on 64-bit CentOS 5.8

I recently upgraded to a CentOS 5.8 VM built by another developer in our shop. Things were working smoothly until I came across an area of our codebase that relied on PIL. (Experienced Pythonistas may start groaning now.) The problem I ran into presented itself with this error:

IOError: decoder jpeg not available

From past experience, I knew that this was probably due to PIL’s being installed via pip without the right versions of libjpeg and libjpeg-devel installed. A quick uninstall/reinstall verified this was the case, since PIL’s installation helpfully tells you what formats are supported after installation.

The solution here is pretty straightforward if you Google around a bit: uninstall PIL again, do a sudo yum install libjpeg and sudo yum install libjpeg-devel, and then reinstall PIL. Right? But when I installed the dependencies, yum informed me that the VM was already up-to-date. (Note that this wasn’t a clean VM, and I have no idea whether they’re installed by default. In any case, if you’re running into this problem you should try the yum install steps just to get them out of the way.)

As it turns out, PIL’s installer only looks for libraries in /usr/lib/ (and a bunch of other places, but this is the relevant bit). But the libjpeg dependencies were actually installed to /usr/lib64/. So the trick is to augment PIL’s file. Here’s how I did it:

  1. pip uninstall PIL (if you haven’t already)
  2. pip install PIL --no-install (this will download the source but not install it)
  3. vi /path/to/virtualenv/build/PIL/ (assuming you’re in a virtualenv)
  4. find the line that says ‘add_directory(library_dirs, "/usr/lib")
  5. put ‘add_directory(library_dirs, "/usr/lib64")‘ just above that line
  6. pip install PIL --no-download

Following these steps got me to a working state where PIL reported it had JPEG support.

Props to StackOverflow user ele, whose related question and answer set me on the right path.

Tips on Python and upgrading to Mountain Lion

This weekend I upgraded one of my machines to Mountain Lion. My system-level Python is pretty bare: I typically just keep pip, virtualenv, virtualenvwrapper, and a few associated libraries installed there. So, as I worked through a few errors I decided to follow this advice from the Hitchhiker’s Guide to Python and begin using Homebrew to manage my Python installation. What follows is a brief list of steps I used to troubleshoot the issues I ran into during the process:

First, we’ll want to install Python:

$ brew install python

Homebrew installs python, easy_install, and pip, so you’ll bootstrap quickly.

Next, add “export PATH=/usr/local/bin:/usr/local/share/python:$PATH” to your ~/.bash_profile. Homebrew describes /usr/local/share/python as an optional addition, but things like virtualenv, virtualenvwrapper, etc., get added there, so you’ll definitely want to include it.

Now, if you previously installed virtualenv, it’s probably pointing to /usr/bin/python. You can verify this by just taking a look at the hashbang in, for example, /usr/local/bin/virtualenv. Let’s get rid of those so they don’t shadow the versions we’re about to install via our brewed pip:

$ rm /usr/local/bin/virtualenv*

Now we can install the virtualenv ecosystem. You can do this pretty quickly with just:

$ pip install virtualenvwrapper

This will install virtualenv, virtualenvwrapper, and virtualenv-clone.

Finally, add “source /usr/local/share/python/” to your ~/.bash_profile. You can use the non-lazy version if you want, but on my machine new shells spin up a lot faster with the lazy version.

This process solved an issue I was experiencing where virtualenvwrapper seemed to have run correctly on login, but I was unable to use mkvirtualenv or virtualenv itself. That problem presented itself as a pkg_resources.DistributionNotFound exception (traceback). It was apparent that virtualenv was relying on the system Python, but I couldn’t figure out why. The legacy /usr/local/bin/virtualenv was the culprit.

Thanks to Thomas Wouters (Yhg1s on #python) for his help in figuring this out.

Setting up your Python environment for Think Stats

Update: I’ve added all my current exercises to a GitHub repo. If you want to see how I’m working with the project structure I describe here, that’s a good place to start.

I’ve been reading Allen Downey’s Think Stats lately. If you’re not familiar with it, this is a book that purports to teach statistics to programmers. Beginning in the first chapter, readers are invited to write code that helps them explore the statistical concepts being discusssed. I’m learning a fair amount, and it’s pretty well-written.

If you follow the link above, you can find the book in PDF and HTML formats.

One area of improvement for the book is in using best practices for working with Python. I initially started working through the examples with the project structure Downey recommends, but that quickly became unwieldy. So I restructured my project and thought I’d share. I had the following goals:

  • achieve code/data separation
  • treat official code as libraries, not suggestions I’m free to change
  • use tools like virtualenv and pip (discussed later)
  • be able to put my own code under version control without having to add too many things to .gitignore

A few notes: First, this post isn’t intended to be a criticism of Downey’s work. In fact, if you’re not interested in Python per se and just want to learn statistics, you should probably ignore this post. Downey’s text is solid and should work on its own. Following these instructions might only be a distraction to you. Second, this tutorial assumes you’re familiar with at least the first chapter of the book. (Maybe you’ve gotten through it and started hesitating about the structure as I have.) I won’t be providing exercise solutions in this post. Third, I assume you’re running on Linux or OS X. Some of the details may be different for Windows users.
Continue reading Setting up your Python environment for Think Stats

Installing PostgreSQL and psycopg2 in a virtualenv on Lion

I had the pleasure today of installing PostgreSQL and psycopg2 in a virtualenv on Lion. Here’s what I did, just so I remember in the future.

Note: These instructions assume a clean virtualenv. So if you’ve already attempted to install psycopg2 without PostgreSQL installed, and it failed, you should probably blow away your virtualenv altogether before attempting the steps below. I’m not sure what about the fact that a failure previously occurred makes further attempts fail, but I believe it’s related to setup.cfg‘s already being partially written. At any rate, here are the steps I took:

  1. Install PostgreSQL via the provided binary. The installer will ask you to reboot. Once that’s done, run the installer again. This should actually install the postgres binaries — likely under /Library/PostgreSQL/<version>/.
  2. workon the relevant virtualenv.
  3. Attempt pip install psycopg2. This should fail but will create a psycopg2 directory under your virtualenv’s build directory. (Note: I’m not sure this step is required, but it’s the order in which I proceeded.)
  4. Edit <virtualenv>/build/psycopg2/setup.cfg with the following lines:
    • include_dirs=/Library/PostgreSQL/9.1/bin (around line 35)
    • library_dirs=/Library/PostgreSQL/9.1/lib (around line 46)
  5. Re-run pip install psycopg2.

New Year’s Python meme: 2012 edition

Partially as a year-in-review kind of action, and partially to reinvigorate my writing, I thought I’d participate in Tarek Ziade’s New Year’s Python meme:

1. What’s the coolest Python application, framework or library you have discovered in 2011?

Flask. Django was the only Python web framework I’d worked with until this last November, so it was nice to see things from Flask’s much more minimalistic perspective. I know Flask isn’t news to anyone in the Python world, but I suspect there are a lot of people who like or are simply comfortable with the kitchen-sink approach of Django and haven’t seen what mini frameworks can do. I was one of those until very recently.

For the record, I haven’t built much in Flask. But I did get involved in a rapid-prototyping exercise where Django’s complexity would only have gotten in the way. Flask’s simplicity let me get a simple REST-ish API up and running in a matter of hours from the point of introduction. I may continue using it beyond the prototype stage, or I may branch out even further. But I’m glad I dove in as much as I did.

2. What new programming technique did you learn in 2011?

Message queues. Specifically, I got to work with celery and django-celery to offload things like external API interactions within our Django apps. I’d done similar previous work when doing batch processing on a mainframe, so the idea of offloading computationally expensive work wasn’t new. But doing it within Python was.

I learned at the same time that MQs can’t be the solution to all of your problems. I’ve seen MQs back up by orders of magnitude due to major, production-crippling failures elsewhere. And in cases like that, it’s often not the case that you want a lot of processing waiting in line to be processed later.

3. What’s the name of the open source project you contributed the most in 2011? What did you do?

Sadly, I only contributed once — to Django. It was at an early stab at sprinting (leading up to AWPUG’s foundation), and we collectively worked on this bug. The patch itself is pretty simple, but I got to see more of Django’s internals than I previously had, and it gave me a chance to meet folks I’ve come to really enjoy working with.

But I also worked extensively on the PyTexas conference, which — though not strictly an “open source project” — represented the bulk of my contribution to the community at large. I’m really looking forward to this next year and some of the things we might be able to do.

4. What was the Python blog or website you read the most in 2011?

Like many who’ve participated in this, Planet Python and the Python subreddit have been my go-to resources this year.

5. What are the three top things you want to learn in 2012?

  • MongoDB and pymongo: We use these at the day job, and right now I shy away from them just out of ignorance fear. This is silly and must be fixed.
  • an async framework (likely Tornado): This kind of development is paradigmatically different from other things I’ve done, but a lot of people think it’s a good idea. That’s reason enough for me. I can also think of a few good use cases for it in work I’ll be doing in the near future.
  • Python packaging: I’ve run into a lot of cases this year where better knowledge in this area would have been useful. Every time I’ve needed to maintain something internal developed by someone who “gets” packaging, it’s taken me way more time than I feel is necessary. I’d love to know more about this area and contribute back to it if possible.

6. What are the top software, app or lib you wish someone would write in 2012?

  • I want to see microformats or some other kind of standardized interchange format for workout/fitness information. This is really specific to my work at MMF, but it would be enormously helpful. Every vendor in this space uses something different.
  • I want to see a platform for [redacted] in real-time. In my side project I’m working on this one right now. 😉
  • Is it okay to ask for docs? Because I’d love to see grok-able explanations for certain advanced topics (e.g., metaprogramming, packaging) become standard — such that I don’t have to Google “wtf is python metaprogramming” and dig through blog posts, because there’s just one (excellent) doc out there describing it.


Want to do your own list? Here’s how:

  1. Copy and paste the questions and answer to them in your blog
  2. Tweet it with the #2012pythonmeme hashtag

MapMyFitness sponsoring PyCon

I got word this week that my employer is officially sponsoring PyCon. I’m pretty stoked about this because I pushed pretty hard to make it happen. It doesn’t hurt that it’ll also be my first chance to attend PyCon. While I’m there, I’m hoping to learn a lot, meet a lot of people whose stuff I read daily on Python Planet and elsewhere, and recruit.

Speaking of which, if you’re a dev with experience in SOA or building SaaS platforms, MapMyFitness is hiring. Python experience is a plus but by no means required. We’re just looking for the right people for our team.

Unladen Swallow’s progress

One thing that caught my eye in the Euler test results is that Unladen Swallow comes in with a total time of 509.72 (seconds?) vs. CPython’s 569.37. That’s an improvement of about 10%. When you look at the wins, Unladen Swallow has 33 vs. CPython’s 51. That 5x improvement on CPython looks pretty far off.

Also, the Project Plan looks like it hasn’t been updated in nearly a year.

Lest this become another “Unladen Swallow is dead” post, I’ll also point out that Collin Winter has recently said they’re now targeting CPython 3.3. Not sure when that is, but 3.2a3 recently dropped, so it’s probably not too far off.

PyPy outperforming CPython and Psyco

David Ripton ran a bunch of implementations against his collection of Euler Challenge solutions:

And now PyPy is clearly the fastest Python implementation for this code, with both the most wins and the lowest overall time.  Psyco is still pretty close.  Both are a bit more than twice as fast as CPython.

I’d really like to see memory usage for these tests, too.

Compute all permutations of a string in Python

The Problem

Here’s a problem I was asked recently:

Write a function permute such that:
permute('abc') → ['abc', 'acb', 'bac', 'bca', 'cab', 'cba']

Now, immediately this should look like a recursive problem. Put in English, we want to do the following:

  1. Iterate through the initial string – e.g., ‘abc’.
  2. For each character in the initial string, set aside that character and get a list of all permutations of the string that’s left. So, for example, if the current iteration is on 'b', we’d want to find all the permutations of the string 'ac'.
  3. Once you have the list from step 2, add each element from that list to the character from the initial string, and append the result to our list of final results. So if we’re on 'b' and we’ve gotten the list ['ac', 'ca'], we’d add 'b' to those, resulting in 'bac' and 'bca', each of which we’d add to our final results.
  4. Return the list of final results.

I was asked this in a somewhat high-pressure situation, and although I could come up with this explanation in English, I couldn’t produce the solution as cleanly as I would have liked. I’m writing it up here both to cement my own understanding of the problem and to help people get clear on what really is a fairly basic example of recursion in Python.

A Solution … Sorta

I eventually came up with something like this:

def permute1(start, rest):
    res = []
    if len(rest) <= 1:
        res += [start + rest, rest + start]
        for i, c in enumerate(rest):
            s = rest[:i] + rest[i+1:]
            for perm in permute1(c, s):
                res += [start + perm]
    return res

This code works, after a fashion:

>>> permute('', 'abc')
['abc', 'acb', 'acb', 'abc', 'bac', 'bca', 'bca', 'bac', 'cab', 'cba', 'cba', 'cab']

But it’s embarrassing. For one thing, it produces duplicates in our list, which is obviously not what we want. For another, it takes two arguments, not one. In other words, it doesn’t solve the problem.

My initial inclination to solve the first problem was to use Python’s built-in set datatype. This was overkill, as it turns out. The duplicates are being caused by the line res += [start + rest, rest + start]. Imagine if start is 'b' and rest is 'c'. Then the algorithm will, in the base case, add ['bc', 'cb'] to res before passing it back. This is just unnecessary, since our for loop below will catch the 'cb' case later on. So a quick fix to that line, adding just [start + rest], will fix the first issue.

However, a correct solution would only take one argument, so we’re still not quite there.

Don’t Overthink Recursion

The problem really is that I was overthinking it. Here’s a trick when it comes to solving recursive problems: just write your function as though when you call it recursively it’ll work as expected. In the English explanation above I had a bit about getting a list of all the permutations of the remaining string. Obviously, that’s exactly the same problem as getting a list of all permutations of the original string! So I just needed to hammer out the problem and not get ahead of myself.

When I sat down to just type, here’s what came out:

def permute2(s):
    res = []
    if len(s) == 1:
        res = [s]
        for i, c in enumerate(s):
            for perm in permute2(s[:i] + s[i+1:]):
                res += [c + perm]

    return res

When you run this, you get the following:

>>> permute2('abc')
['abc', 'acb', 'bac', 'bca', 'cab', 'cba']

Much nicer! Not only does it work as expected, but the code feels cleaner than before. Also note that, in both examples, I use Python’s built-in enumerate() to avoid the fairly un-Pythonic tricks using range(len(s)) you see in places like this.


For the masochists, here’s a condensed version using the ternary operator and list comprehensions:

def permute3(s):
    return [s] if len(s) == 1 else [c + perm for i, c in enumerate(s) for perm in permute3(s[:i]+s[i+1:])]

Don’t use this.