JSON from Excel using Python openpyxl

Powering an application from a collection of static files rather than a call to an API or database server is well worth considering. Removing a production database and data access layer from the mix eliminates complexity and trims development time. Static files aren’t right for every situation, but for small apps they may be all you need.

In my days building back ends for interactive news applications, most of our apps consumed JSON. Simple apps ran off one or two static JSON files. The more complex used hundreds. Updating content was as easy as generating new JSON files and uploading them to a file server.

I’ve written about how to use Python to generate JSON files from a SQL database. But it’s just as easy to use Python to make JSON out of Excel spreadsheets. The key ingredient is the Python library openpyxl, which offers a collection of functions for reading and writing Excel .xlsx workbooks. You can add it to your Python environment by running pip install openpyxl.

Let’s look at how it works, then use with some Python to generate JSON.

Basic openpyxl functions

Let’s say we have an Excel workbook containing a small table repeated over three worksheets. The table in each case looks like this:

Here are some snippets of code — just scratching the surface — to interact with it programmatically:

# Demonstrates basic openpyxl functions for working with Excel files
# (.xlsx format)

from openpyxl import load_workbook

# load the workbook
wb = load_workbook('excel-xlrd-sample.xlsx')

# Print the sheet names
print(wb.sheetnames)

# Get a sheet by name
sheet = wb['Cars20']

# Iterate through sheet rows, returning each as a tuple:
for row in sheet.values:
    print(row)

# If you just want the first column:
for cell in sheet['A']:
    print(cell.value)

# Index individual cells:
cell_c4 = sheet['C4'].value
print(cell_c4)

# Or you can use:
cell_c4 = sheet.cell(row=4, column=3).value
print(cell_c4)

# Let's say you want the same cell every sheet in a workbook:
for sheet in wb.sheetnames:
    open_sheet = wb[sheet]
    cell_x = open_sheet['D2'].value
    print(cell_x)

From Excel to JSON

Pretty cool stuff. Now, let’s convert our sample spreadsheet to JSON. I’ll borrow some of the techniques I discussed when outlining how to use Python to build JSON from a SQL database:

import json
from collections import OrderedDict
from itertools import islice
from openpyxl import load_workbook

# Open the workbook and select a worksheet
wb = load_workbook('excel-xlrd-sample.xlsx')
sheet = wb['Cars20']

# List to hold dictionaries
cars_list = []

# Iterate through each row in worksheet and fetch values into dict
for row in islice(sheet.values, 1, sheet.max_row):
    cars = OrderedDict()
    cars['car-id'] = row[0]
    cars['make'] = row[1]
    cars['model'] = row[2]
    cars['miles'] = row[3]
    cars_list.append(cars)

# Serialize the list of dicts to JSON
j = json.dumps(cars_list)

# Write to file
with open('data.json', 'w') as f:
    f.write(j)

Here’s the breakdown: We open the workbook, select a sheet, and set up a blank list to hold rows of data. Then we use Python’s islice tool to iterate through the rows in the sheet, skipping the header and stopping at the max_row property of the sheet.

For each row, we add each cell to a key/value pair in a dictionary, then add each dictionary to a list. Then we dump the list to JSON and write to a file.

Of course, a spreadsheet this simple doesn’t need a Python script to make its way to JSON. Just use Mr. Data Converter. But as soon as your JSON requirements gain complexity, the ability to use Python to nest nodes, build strings and transform data on the fly make this approach very appealing.


Like what you see here? Please check out my book, Practical SQL: A Beginner’s Guide to Storytelling With Data, available from No Starch Press and through all fine booksellers.