Tips on Python and upgrading to Mountain Lion

This weekend I upgraded one of my machines to Mountain Lion. My system-level Python is pretty bare: I typically just keep pip, virtualenv, virtualenvwrapper, and a few associated libraries installed there. So, as I worked through a few errors I decided to follow this advice from the Hitchhiker’s Guide to Python and begin using Homebrew to manage my Python installation. What follows is a brief list of steps I used to troubleshoot the issues I ran into during the process:

First, we’ll want to install Python:

$ brew install python

Homebrew installs python, easy_install, and pip, so you’ll bootstrap quickly.

Next, add “export PATH=/usr/local/bin:/usr/local/share/python:$PATH” to your ~/.bash_profile. Homebrew describes /usr/local/share/python as an optional addition, but things like virtualenv, virtualenvwrapper, etc., get added there, so you’ll definitely want to include it.

Now, if you previously installed virtualenv, it’s probably pointing to /usr/bin/python. You can verify this by just taking a look at the hashbang in, for example, /usr/local/bin/virtualenv. Let’s get rid of those so they don’t shadow the versions we’re about to install via our brewed pip:

$ rm /usr/local/bin/virtualenv*

Now we can install the virtualenv ecosystem. You can do this pretty quickly with just:

$ pip install virtualenvwrapper

This will install virtualenv, virtualenvwrapper, and virtualenv-clone.

Finally, add “source /usr/local/share/python/virtualenvwrapper_lazy.sh” to your ~/.bash_profile. You can use the non-lazy version if you want, but on my machine new shells spin up a lot faster with the lazy version.

This process solved an issue I was experiencing where virtualenvwrapper seemed to have run correctly on login, but I was unable to use mkvirtualenv or virtualenv itself. That problem presented itself as a pkg_resources.DistributionNotFound exception (traceback). It was apparent that virtualenv was relying on the system Python, but I couldn’t figure out why. The legacy /usr/local/bin/virtualenv was the culprit.

Thanks to Thomas Wouters (Yhg1s on #python) for his help in figuring this out.

Setting up your Python environment for Think Stats

Update: I’ve added all my current exercises to a GitHub repo. If you want to see how I’m working with the project structure I describe here, that’s a good place to start.

I’ve been reading Allen Downey’s Think Stats lately. If you’re not familiar with it, this is a book that purports to teach statistics to programmers. Beginning in the first chapter, readers are invited to write code that helps them explore the statistical concepts being discusssed. I’m learning a fair amount, and it’s pretty well-written.

If you follow the link above, you can find the book in PDF and HTML formats.

One area of improvement for the book is in using best practices for working with Python. I initially started working through the examples with the project structure Downey recommends, but that quickly became unwieldy. So I restructured my project and thought I’d share. I had the following goals:

  • achieve code/data separation
  • treat official code as libraries, not suggestions I’m free to change
  • use tools like virtualenv and pip (discussed later)
  • be able to put my own code under version control without having to add too many things to .gitignore

A few notes: First, this post isn’t intended to be a criticism of Downey’s work. In fact, if you’re not interested in Python per se and just want to learn statistics, you should probably ignore this post. Downey’s text is solid and should work on its own. Following these instructions might only be a distraction to you. Second, this tutorial assumes you’re familiar with at least the first chapter of the book. (Maybe you’ve gotten through it and started hesitating about the structure as I have.) I won’t be providing exercise solutions in this post. Third, I assume you’re running on Linux or OS X. Some of the details may be different for Windows users.
Continue reading Setting up your Python environment for Think Stats

New Year’s Python meme: 2012 edition

Partially as a year-in-review kind of action, and partially to reinvigorate my writing, I thought I’d participate in Tarek Ziade’s New Year’s Python meme:

1. What’s the coolest Python application, framework or library you have discovered in 2011?

Flask. Django was the only Python web framework I’d worked with until this last November, so it was nice to see things from Flask’s much more minimalistic perspective. I know Flask isn’t news to anyone in the Python world, but I suspect there are a lot of people who like or are simply comfortable with the kitchen-sink approach of Django and haven’t seen what mini frameworks can do. I was one of those until very recently.

For the record, I haven’t built much in Flask. But I did get involved in a rapid-prototyping exercise where Django’s complexity would only have gotten in the way. Flask’s simplicity let me get a simple REST-ish API up and running in a matter of hours from the point of introduction. I may continue using it beyond the prototype stage, or I may branch out even further. But I’m glad I dove in as much as I did.

2. What new programming technique did you learn in 2011?

Message queues. Specifically, I got to work with celery and django-celery to offload things like external API interactions within our Django apps. I’d done similar previous work when doing batch processing on a mainframe, so the idea of offloading computationally expensive work wasn’t new. But doing it within Python was.

I learned at the same time that MQs can’t be the solution to all of your problems. I’ve seen MQs back up by orders of magnitude due to major, production-crippling failures elsewhere. And in cases like that, it’s often not the case that you want a lot of processing waiting in line to be processed later.

3. What’s the name of the open source project you contributed the most in 2011? What did you do?

Sadly, I only contributed once — to Django. It was at an early stab at sprinting (leading up to AWPUG’s foundation), and we collectively worked on this bug. The patch itself is pretty simple, but I got to see more of Django’s internals than I previously had, and it gave me a chance to meet folks I’ve come to really enjoy working with.

But I also worked extensively on the PyTexas conference, which — though not strictly an “open source project” — represented the bulk of my contribution to the community at large. I’m really looking forward to this next year and some of the things we might be able to do.

4. What was the Python blog or website you read the most in 2011?

Like many who’ve participated in this, Planet Python and the Python subreddit have been my go-to resources this year.

5. What are the three top things you want to learn in 2012?

  • MongoDB and pymongo: We use these at the day job, and right now I shy away from them just out of ignorance fear. This is silly and must be fixed.
  • an async framework (likely Tornado): This kind of development is paradigmatically different from other things I’ve done, but a lot of people think it’s a good idea. That’s reason enough for me. I can also think of a few good use cases for it in work I’ll be doing in the near future.
  • Python packaging: I’ve run into a lot of cases this year where better knowledge in this area would have been useful. Every time I’ve needed to maintain something internal developed by someone who “gets” packaging, it’s taken me way more time than I feel is necessary. I’d love to know more about this area and contribute back to it if possible.

6. What are the top software, app or lib you wish someone would write in 2012?

  • I want to see microformats or some other kind of standardized interchange format for workout/fitness information. This is really specific to my work at MMF, but it would be enormously helpful. Every vendor in this space uses something different.
  • I want to see a platform for [redacted] in real-time. In my side project I’m working on this one right now. 😉
  • Is it okay to ask for docs? Because I’d love to see grok-able explanations for certain advanced topics (e.g., metaprogramming, packaging) become standard — such that I don’t have to Google “wtf is python metaprogramming” and dig through blog posts, because there’s just one (excellent) doc out there describing it.

 

Want to do your own list? Here’s how:

  1. Copy and paste the questions and answer to them in your blog
  2. Tweet it with the #2012pythonmeme hashtag

Unladen Swallow’s progress

One thing that caught my eye in the Euler test results is that Unladen Swallow comes in with a total time of 509.72 (seconds?) vs. CPython’s 569.37. That’s an improvement of about 10%. When you look at the wins, Unladen Swallow has 33 vs. CPython’s 51. That 5x improvement on CPython looks pretty far off.

Also, the Project Plan looks like it hasn’t been updated in nearly a year.

Lest this become another “Unladen Swallow is dead” post, I’ll also point out that Collin Winter has recently said they’re now targeting CPython 3.3. Not sure when that is, but 3.2a3 recently dropped, so it’s probably not too far off.

PyPy outperforming CPython and Psyco

David Ripton ran a bunch of implementations against his collection of Euler Challenge solutions:

And now PyPy is clearly the fastest Python implementation for this code, with both the most wins and the lowest overall time.  Psyco is still pretty close.  Both are a bit more than twice as fast as CPython.

I’d really like to see memory usage for these tests, too.

Compute all permutations of a string in Python

The Problem

Here’s a problem I was asked recently:

Write a function permute such that:
permute('abc') → ['abc', 'acb', 'bac', 'bca', 'cab', 'cba']

Now, immediately this should look like a recursive problem. Put in English, we want to do the following:

  1. Iterate through the initial string – e.g., ‘abc’.
  2. For each character in the initial string, set aside that character and get a list of all permutations of the string that’s left. So, for example, if the current iteration is on 'b', we’d want to find all the permutations of the string 'ac'.
  3. Once you have the list from step 2, add each element from that list to the character from the initial string, and append the result to our list of final results. So if we’re on 'b' and we’ve gotten the list ['ac', 'ca'], we’d add 'b' to those, resulting in 'bac' and 'bca', each of which we’d add to our final results.
  4. Return the list of final results.

I was asked this in a somewhat high-pressure situation, and although I could come up with this explanation in English, I couldn’t produce the solution as cleanly as I would have liked. I’m writing it up here both to cement my own understanding of the problem and to help people get clear on what really is a fairly basic example of recursion in Python.

A Solution … Sorta

I eventually came up with something like this:

def permute1(start, rest):
    res = []
    if len(rest) <= 1:
        res += [start + rest, rest + start]
    else:
        for i, c in enumerate(rest):
            s = rest[:i] + rest[i+1:]
            for perm in permute1(c, s):
                res += [start + perm]
    return res

This code works, after a fashion:

>>> permute('', 'abc')
['abc', 'acb', 'acb', 'abc', 'bac', 'bca', 'bca', 'bac', 'cab', 'cba', 'cba', 'cab']

But it’s embarrassing. For one thing, it produces duplicates in our list, which is obviously not what we want. For another, it takes two arguments, not one. In other words, it doesn’t solve the problem.

My initial inclination to solve the first problem was to use Python’s built-in set datatype. This was overkill, as it turns out. The duplicates are being caused by the line res += [start + rest, rest + start]. Imagine if start is 'b' and rest is 'c'. Then the algorithm will, in the base case, add ['bc', 'cb'] to res before passing it back. This is just unnecessary, since our for loop below will catch the 'cb' case later on. So a quick fix to that line, adding just [start + rest], will fix the first issue.

However, a correct solution would only take one argument, so we’re still not quite there.

Don’t Overthink Recursion

The problem really is that I was overthinking it. Here’s a trick when it comes to solving recursive problems: just write your function as though when you call it recursively it’ll work as expected. In the English explanation above I had a bit about getting a list of all the permutations of the remaining string. Obviously, that’s exactly the same problem as getting a list of all permutations of the original string! So I just needed to hammer out the problem and not get ahead of myself.

When I sat down to just type, here’s what came out:

def permute2(s):
    res = []
    if len(s) == 1:
        res = [s]
    else:
        for i, c in enumerate(s):
            for perm in permute2(s[:i] + s[i+1:]):
                res += 

    return res

When you run this, you get the following:

>>> permute2('abc')
['abc', 'acb', 'bac', 'bca', 'cab', 'cba']

Much nicer! Not only does it work as expected, but the code feels cleaner than before. Also note that, in both examples, I use Python’s built-in enumerate() to avoid the fairly un-Pythonic tricks using range(len(s)) you see in places like this.

Bonus

For the masochists, here’s a condensed version using the ternary operator and list comprehensions:

def permute3(s):
    return [s] if len(s) == 1 else +s[i+1:])]

Don’t use this.

Django and PEP 8

If you’re looking around the source for Django, and you’re wondering why arguments that might be named ‘class’ are named ‘klass’ instead of ‘cls’ or ‘class_’ per PEP 8 guidelines (e.g. here), you might be interested to know that the recommendation against ‘klass’ didn’t fully appear in PEP 8 until December, 2005. Prior to that, its use was discouraged, but the major recommendation was consistency.