JSON from Excel using Python openpyxl
Powering an application from a collection of static files rather than a call to an API or database server is well worth considering. Removing a production database and data access layer from the mix eliminates complexity and trims development time. Static files aren’t right for every situation, but for small apps they may be all you need.
In my days building back ends for interactive news applications, most of our apps consumed JSON. Simple apps ran off one or two static JSON files. The more complex used hundreds. Updating content was as easy as generating new JSON files and uploading them to a file server.
I’ve written about how to use Python to generate JSON files from a SQL database. But it’s just as easy to use Python to make JSON out of Excel spreadsheets. The key ingredient is the Python library openpyxl, which offers a collection of functions for reading and writing Excel .xlsx workbooks. You can add it to your Python environment by running
pip install openpyxl.
Let’s look at how it works, then use with some Python to generate JSON.
Basic openpyxl functions
Let’s say we have an Excel workbook containing a small table repeated over three worksheets. The table in each case looks like this:
Here are some snippets of code — just scratching the surface — to interact with it programmatically:
# Demonstrates basic openpyxl functions for working with Excel files # (.xlsx format) from openpyxl import load_workbook # load the workbook wb = load_workbook('excel-xlrd-sample.xlsx') # Print the sheet names print(wb.sheetnames) # Get a sheet by name sheet = wb['Cars20'] # Iterate through sheet rows, returning each as a tuple: for row in sheet.values: print(row) # If you just want the first column: for cell in sheet['A']: print(cell.value) # Index individual cells: cell_c4 = sheet['C4'].value print(cell_c4) # Or you can use: cell_c4 = sheet.cell(row=4, column=3).value print(cell_c4) # Let's say you want the same cell every sheet in a workbook: for sheet in wb.sheetnames: open_sheet = wb[sheet] cell_x = open_sheet['D2'].value print(cell_x)
From Excel to JSON
Pretty cool stuff. Now, let’s convert our sample spreadsheet to JSON. I’ll borrow some of the techniques I discussed when outlining how to use Python to build JSON from a SQL database:
import json from collections import OrderedDict from itertools import islice from openpyxl import load_workbook # Open the workbook and select a worksheet wb = load_workbook('excel-xlrd-sample.xlsx') sheet = wb['Cars20'] # List to hold dictionaries cars_list =  # Iterate through each row in worksheet and fetch values into dict for row in islice(sheet.values, 1, sheet.max_row): cars = OrderedDict() cars['car-id'] = row cars['make'] = row cars['model'] = row cars['miles'] = row cars_list.append(cars) # Serialize the list of dicts to JSON j = json.dumps(cars_list) # Write to file with open('data.json', 'w') as f: f.write(j)
Here’s the breakdown: We open the workbook, select a sheet, and set up a blank list to hold rows of data. Then we use Python’s
islice tool to iterate through the rows in the sheet, skipping the header and stopping at the
max_row property of the sheet.
For each row, we add each cell to a key/value pair in a dictionary, then add each dictionary to a list. Then we dump the list to JSON and write to a file.
Of course, a spreadsheet this simple doesn’t need a Python script to make its way to JSON. Just use Mr. Data Converter. But as soon as your JSON requirements gain complexity, the ability to use Python to nest nodes, build strings and transform data on the fly make this approach very appealing.
Anthony DeBarros is the author of Practical SQL: A Beginner’s Guide to Storytelling With Data from No Starch Press.