Get JSON from Excel using Python, xlrd
Author’s note: This post is out of date! The xlrd library it covers is no longer maintained. Please see the post JSON from Excel using Python openpyxl for a more current approach.
Powering interactive news applications off flat files rather than a call to a database server is an option worth considering. Cutting a production database and data access layer out of the mix eliminates a whole slice of complexity and trims development time. Flat files aren’t right for every situation, but for small apps they’re often all you need.
These days, most of the apps I help build at Gannett Digital consume JSON. Simpler apps — such as the table/modal displays we deployed in February for our Oscar Scorecard and Princeton Review Best Value Colleges — run off one or two JSON files. The more complex — e.g., last year’s Ghost Factories: Poison in the Ground — run off hundreds of them. Updating content is as easy as generating new JSON and uploading it to our file server.
I wrote last year how to use Python to generate JSON files from a SQL database. Lately, I’ve been using Python to make JSON out of Excel spreadsheets.
(Another choice is openpyxl, which has similar features and works with newer .xlsx formatted Excel files. I’ve used both with equal success. Take your pick.)
Basic xlrd operations
Let’s say we have an Excel workbook containing a small table repeated over three worksheets. The table in each case looks like this:
Here are some snippets of code — just scratching the surface — to interact with it programmatically:
# Demonstrates basic xlrd functions for working with Excel files # (Excel 97-2003) import xlrd # Open the workbook wb = xlrd.open_workbook('excel-xlrd-sample.xls') # Print the sheet names print wb.sheet_names() # Get the first sheet either by index or by name sh = wb.sheet_by_index(0) # Iterate through rows, returning each as a list that you can index: for rownum in range(sh.nrows): print sh.row_values(rownum) # If you just want the first column: first_column = sh.col_values(0) print first_column # Index individual cells: cell_c4 = sh.cell(3, 2).value # Or you can use: #cell_c4 = sh.cell(rowx=3, colx=2).value print cell_c4 # Let's say you want the same cell from x identical sheets in a workbook: x = 2 while x >= 0: sh = wb.sheet_by_index(x) cell_x = sh.cell(2, 3).value print cell_x x = x - 1
From Excel to JSON
Pretty cool stuff. Now, let’s convert our sample spreadsheet to JSON. I’ll borrow some of the techniques I discussed when outlining how to use Python to build JSON from a SQL database:
import xlrd from collections import OrderedDict import simplejson as json # Open the workbook and select the first worksheet wb = xlrd.open_workbook('excel-xlrd-sample.xls') sh = wb.sheet_by_index(0) # List to hold dictionaries cars_list =  # Iterate through each row in worksheet and fetch values into dict for rownum in range(1, sh.nrows): cars = OrderedDict() row_values = sh.row_values(rownum) cars['car-id'] = row_values cars['make'] = row_values cars['model'] = row_values cars['miles'] = row_values cars_list.append(cars) # Serialize the list of dicts to JSON j = json.dumps(cars_list) # Write to file with open('data.json', 'w') as f: f.write(j)
Here’s the breakdown: We open the workbook, select the sheet and iterate through the available rows (which xlrd conveniently counts using its nrows method).
Add each cell to a key/value pair in a dictionary, then add each dictionary to a list. Dump the list to JSON and write to a file.
Of course, a spreadsheet this simple doesn’t need a Python script to make its way to JSON. Just use Mr. Data Converter for something like this. But as soon as your JSON requirements gain complexity, the ability to use Python to nest nodes, build strings and transform data on the fly make this approach very appealing.
Learn data analysis with SQL!
If you like what you read here, check out my book Practical SQL: A Beginner’s Guide to Storytelling with Data from No Starch Press.