Get JSON from Excel using Python, xlrd

Author’s note: This post is out of date! The xlrd library it covers is no longer maintained. Please see the post JSON from Excel using Python openpyxl for a more current approach.


Powering interactive news applications off flat files rather than a call to a database server is an option worth considering. Cutting a production database and data access layer out of the mix eliminates a whole slice of complexity and trims development time. Flat files aren’t right for every situation, but for small apps they’re often all you need.

These days, most of the apps I help build at Gannett Digital consume JSON. Simpler apps — such as the table/modal displays we deployed in February for our Oscar Scorecard and Princeton Review Best Value Colleges — run off one or two JSON files. The more complex — e.g., last year’s Ghost Factories: Poison in the Ground — run off hundreds of them. Updating content is as easy as generating new JSON and uploading it to our file server.

I wrote last year how to use Python to generate JSON files from a SQL database. Lately, I’ve been using Python to make JSON out of Excel spreadsheets.

The key ingredient is the Python library xlrd. It’s part of a suite of Excel-related tools available from www.python-excel.org, including the xlwt library to create Excel files.

(Another choice is openpyxl, which has similar features and works with newer .xlsx formatted Excel files. I’ve used both with equal success. Take your pick.)

Basic xlrd operations

Let’s say we have an Excel workbook containing a small table repeated over three worksheets. The table in each case looks like this:

table

Here are some snippets of code — just scratching the surface — to interact with it programmatically:

# Demonstrates basic xlrd functions for working with Excel files
# (Excel 97-2003)

import xlrd

# Open the workbook
wb = xlrd.open_workbook('excel-xlrd-sample.xls')

# Print the sheet names
print wb.sheet_names()

# Get the first sheet either by index or by name
sh = wb.sheet_by_index(0)

# Iterate through rows, returning each as a list that you can index:
for rownum in range(sh.nrows):
    print sh.row_values(rownum)

# If you just want the first column:
first_column = sh.col_values(0)
print first_column

# Index individual cells:
cell_c4 = sh.cell(3, 2).value
# Or you can use:
#cell_c4 = sh.cell(rowx=3, colx=2).value
print cell_c4

# Let's say you want the same cell from x identical sheets in a workbook:
x = 2
while x >= 0:
    sh = wb.sheet_by_index(x)
    cell_x = sh.cell(2, 3).value
    print cell_x
    x = x - 1

From Excel to JSON

Pretty cool stuff. Now, let’s convert our sample spreadsheet to JSON. I’ll borrow some of the techniques I discussed when outlining how to use Python to build JSON from a SQL database:

import xlrd
from collections import OrderedDict
import simplejson as json

# Open the workbook and select the first worksheet
wb = xlrd.open_workbook('excel-xlrd-sample.xls')
sh = wb.sheet_by_index(0)

# List to hold dictionaries
cars_list = []

# Iterate through each row in worksheet and fetch values into dict
for rownum in range(1, sh.nrows):
    cars = OrderedDict()
    row_values = sh.row_values(rownum)
    cars['car-id'] = row_values[0]
    cars['make'] = row_values[1]
    cars['model'] = row_values[2]
    cars['miles'] = row_values[3]

    cars_list.append(cars)

# Serialize the list of dicts to JSON
j = json.dumps(cars_list)

# Write to file
with open('data.json', 'w') as f:
    f.write(j)

Here’s the breakdown: We open the workbook, select the sheet and iterate through the available rows (which xlrd conveniently counts using its nrows method).

Add each cell to a key/value pair in a dictionary, then add each dictionary to a list. Dump the list to JSON and write to a file.

Of course, a spreadsheet this simple doesn’t need a Python script to make its way to JSON. Just use Mr. Data Converter for something like this. But as soon as your JSON requirements gain complexity, the ability to use Python to nest nodes, build strings and transform data on the fly make this approach very appealing.

Learn data analysis with SQL!
If you like what you read here, check out my book Practical SQL: A Beginner’s Guide to Storytelling with Data from No Starch Press.

Catch the Cat: A Simple Python Game

New cat in the house? Check.

Son interested in programming? Check.

Dad eager to learn more Python? Check.

Add it up and you get a little project the two of us worked up over a couple of nights this summer — a Python-based text game called Catch the Cat. You can download the code from its repository on Github. To use it, from a command prompt run:
 

python catchthecat.py

The game’s simple: The cat’s on the loose in the house, and you have to catch it. Every time you move to a new room, the cat moves too — at random. To catch the cat, you both have to arrive in the same place.

OK, so we’re not going to unseat EA Sports here, but we met our goals: we had fun and learned some tricks. While I was using the exercise to help my son work through the logic choices required in game play, I got to add few things to my Python toolkit:

— Using Python’s SQLite integration to build a leaderboard database.

— Writing classes for the cat and player objects.

Have you written Python-based games? Suggestions for improvement here?

12 Tangents Later, I Publish a Django Site

Last week, I deployed my first live Django app. Time from start to finish: three years.

Cue the sound of snickers and a thousand eye-rolls. Go ahead. But I confess: From the moment I said, “I want to build something using Django” to the moment I restarted Apache on my WebFaction server and watched the site load for real in my browser, 36 months passed through the hourglass of time.

You see, I got diverted along the way. I’ll tell you why. But first, two things:

1. Learning is wonderful, thrilling, maddening and rewarding. If you’re a journalist and want to see new worlds, let me encourage you to take a journey into code.

2. The site is right here and the code is here. It falls way short in the Awesome Dept., and it will not save journalism. But that’s not why I built it, really.

* * *

The tale began March 2009 in Indianapolis at the Investigative Reporters and Editors Computer-Assisted Reporting conference. That’s the annual data journalism hoedown that draws investigative journalists, app coders and academics for a couple of days of nerdish talk about finding and telling stories with data.
Continue…

Scraping CDC flu data with Python

Getting my flu shot this week reminded me about weekly surveillance data the Centers for Disease Control and Prevention provides on flu prevalence across the nation. I’d been planning to do some Python training for my team at work, so it seemed like a natural to write a quick Python scraper that grabs the main table on the site and turns it into a delimited text file.

So I did, and I’m sharing. You can grab the code for the CDC-flu-scraper on Github.

The code uses the Mechanize and BeautifulSoup modules for web browsing and html parsing, respectively. Much of what I demonstrate here I started learning via Ben Welsh’s fine tutorial on web scraping.

We’re still early in flu season, but if you watch this data each week you’ll see the activity pick up quickly.

Update 10/22/2011: Ben Welsh has lent some contributions to this scraper, adding JSON output and turning it into a function. Benefits of social coding 101 …

Setting up Python in Windows 7

Note: Also see my guide for setting up Python under Windows 10!


An all-wise journalist once told me that “everything is easier in Linux,” and after working with it for a few years I’d have to agree — especially when it comes to software setup for data journalism. But …

Many newsroom types spend the day in Windows without the option of Ubuntu or another Linux OS. I’ve been planning some training around Python soon, so I compiled this quick setup guide as a reference. I hope you find it helpful.

Set up Python on Windows 7

Get started:

  1. Visit the official Python download page and grab the Windows installer. Choose the 32-bit version. A 64-bit version is available, but there are compatibility issues with some modules you may want to install later. (Thanks to commenters for pointing this out.)

Note: Python currently exists in two versions, the older 2.x series and newer 3.x series (for a discussion of the differences, see this). This tutorial focuses on the 2.x series.

  1. Run the installer and accept all the default settings, including the “C:\Python27” directory it creates.

Continue…