Setting up your Python environment for Think Stats

Update: I’ve added all my current exercises to a GitHub repo. If you want to see how I’m working with the project structure I describe here, that’s a good place to start.

I’ve been reading Allen Downey’s Think Stats lately. If you’re not familiar with it, this is a book that purports to teach statistics to programmers. Beginning in the first chapter, readers are invited to write code that helps them explore the statistical concepts being discusssed. I’m learning a fair amount, and it’s pretty well-written.

If you follow the link above, you can find the book in PDF and HTML formats.

One area of improvement for the book is in using best practices for working with Python. I initially started working through the examples with the project structure Downey recommends, but that quickly became unwieldy. So I restructured my project and thought I’d share. I had the following goals:

  • achieve code/data separation
  • treat official code as libraries, not suggestions I’m free to change
  • use tools like virtualenv and pip (discussed later)
  • be able to put my own code under version control without having to add too many things to .gitignore

A few notes: First, this post isn’t intended to be a criticism of Downey’s work. In fact, if you’re not interested in Python per se and just want to learn statistics, you should probably ignore this post. Downey’s text is solid and should work on its own. Following these instructions might only be a distraction to you. Second, this tutorial assumes you’re familiar with at least the first chapter of the book. (Maybe you’ve gotten through it and started hesitating about the structure as I have.) I won’t be providing exercise solutions in this post. Third, I assume you’re running on Linux or OS X. Some of the details may be different for Windows users.
Continue reading Setting up your Python environment for Think Stats