JSON from Excel using Python openpyxl

Powering an application from a collection of static files rather than a call to an API or database server is well worth considering. Removing a production database and data access layer from the mix eliminates complexity and trims development time. Static files aren’t right for every situation, but for small apps they may be all you need.

In my days building back ends for interactive news applications, most of our apps consumed JSON. Simple apps ran off one or two static JSON files. The more complex used hundreds. Updating content was as easy as generating new JSON files and uploading them to a file server.

I’ve written about how to use Python to generate JSON files from a SQL database. But it’s just as easy to use Python to make JSON out of Excel spreadsheets. The key ingredient is the Python library openpyxl, which offers a collection of functions for reading and writing Excel .xlsx workbooks. You can add it to your Python environment by running pip install openpyxl.

Let’s look at how it works, then use with some Python to generate JSON.

Basic openpyxl functions

Let’s say we have an Excel workbook containing a small table repeated over three worksheets. The table in each case looks like this:

Here are some snippets of code — just scratching the surface — to interact with it programmatically:

# Demonstrates basic openpyxl functions for working with Excel files
# (.xlsx format)

from openpyxl import load_workbook

# load the workbook
wb = load_workbook('excel-xlrd-sample.xlsx')

# Print the sheet names
print(wb.sheetnames)

# Get a sheet by name
sheet = wb['Cars20']

# Iterate through sheet rows, returning each as a tuple:
for row in sheet.values:
    print(row)

# If you just want the first column:
for cell in sheet['A']:
    print(cell.value)

# Index individual cells:
cell_c4 = sheet['C4'].value
print(cell_c4)

# Or you can use:
cell_c4 = sheet.cell(row=4, column=3).value
print(cell_c4)

# Let's say you want the same cell every sheet in a workbook:
for sheet in wb.sheetnames:
    open_sheet = wb[sheet]
    cell_x = open_sheet['D2'].value
    print(cell_x)

From Excel to JSON

Pretty cool stuff. Now, let’s convert our sample spreadsheet to JSON. I’ll borrow some of the techniques I discussed when outlining how to use Python to build JSON from a SQL database:

import json
from collections import OrderedDict
from itertools import islice
from openpyxl import load_workbook

# Open the workbook and select a worksheet
wb = load_workbook('excel-xlrd-sample.xlsx')
sheet = wb['Cars20']

# List to hold dictionaries
cars_list = []

# Iterate through each row in worksheet and fetch values into dict
for row in islice(sheet.values, 1, sheet.max_row):
    cars = OrderedDict()
    cars['car-id'] = row[0]
    cars['make'] = row[1]
    cars['model'] = row[2]
    cars['miles'] = row[3]
    cars_list.append(cars)

# Serialize the list of dicts to JSON
j = json.dumps(cars_list)

# Write to file
with open('data.json', 'w') as f:
    f.write(j)

Here’s the breakdown: We open the workbook, select a sheet, and set up a blank list to hold rows of data. Then we use Python’s islice tool to iterate through the rows in the sheet, skipping the header and stopping at the max_row property of the sheet.

For each row, we add each cell to a key/value pair in a dictionary, then add each dictionary to a list. Then we dump the list to JSON and write to a file.

Of course, a spreadsheet this simple doesn’t need a Python script to make its way to JSON. Just use Mr. Data Converter. But as soon as your JSON requirements gain complexity, the ability to use Python to nest nodes, build strings and transform data on the fly make this approach very appealing.


Like what you see here? Please check out my book, Practical SQL: A Beginner’s Guide to Storytelling With Data, available from No Starch Press and through all fine booksellers.

Generate JSON From SQL Using Python

Author’s note: This post has been updated from its original 2012 version to use PostgreSQL and the Python psycopg2 library. Also, if you like what you read here, please check out my book Practical SQL: A Beginner’s Guide to Storytelling with Data from No Starch Press.


Let’s say you want to generate a few hundred — or even a thousand — flat JSON files from a SQL database. Maybe you want to power a data visualization but have neither the time nor the desire to spin up a server to dynamically generate the data. So, you want flat files, each one small for quick loading. And a lot of them.

A few lines of Python is all you need.

I’ve gone this route for a few data-driven visuals, creating JSON files out of large database tables. Python works well for this, with its JSON encoder/decoder offering a flexible set of tools for converting Python objects to JSON. Let’s see how this works.

The Data

I’m using PostgreSQL for this tutorial. Here’s a script that creates a table and fills it with two rows of data:

CREATE TABLE students_test (
    id integer,
    first_name text,
    last_name text,
    street text,
    city text,
    st text,
    zip text
);
    
INSERT INTO students_test VALUES
(1, 'Samantha', 'Baker', '9 Main St.', 'Hyde Park', 'NY', '12538'),
(2, 'Mark', 'Salomon', '3 Stavesacre Terrace', 'Dallas', 'TX', '75204');

The Script

Here’s an example Pythonscript that generates two JSON files from that query. One file contains JSON row arrays, and the other has JSON key-value objects. Below, we’ll walk through it step-by-step.

Continue…

Setting up Python in Windows 10

Installing Python under Windows 10 is fairly easy as long as you set up your system environment correctly. Here’s your quick guide:

  1. Visit the official Python download page and grab the Windows installer for the latest version of Python 3.
  2. Navigate to the folder where you saved the installation file. Right-click the installer and select “Run as Administrator.” Click “Yes” when Windows asks if you want the program to make changes to your computer. 
  3. The next dialog asks whether you want to “Install Now” or “Customize Installation.” First, check the boxes that say “Install launcher for all users” and “Add Python 3.8 to PATH”. Then, choose “Customize Installation.”
  4. On the next screen, check all boxes under “Optional Features.” Click next.
  5. Under “Advanced Options,” set the location where you want to install Python. I like to use a location I can easily find, such as:
C:\Python38

That will remind me that I’ve installed Python 3.8. (Note that you may need to make a folder if you want to use a location that doesn’t exist yet.)

Also under “Advanced Options,” check the following boxes:

  • “Install for all users”
  • “Associate files with Python”
  • “Create shortcuts for installed applications”
  • “Add Python to environment variables”
  • “Precompile standard library”
  1. Click through the dialog to finish the installation and exit the installer.

Testing Your Install

Now, to launch the Python interpreter from the command line, you can open a command prompt (Start Menu > Windows System > Command Prompt) and type:

python

That will load the Python interpreter:

Python 3.8.5 (tags/v3.8.5:580fbb0, Jul 20 2020, 15:43:08) [MSC v.1926 32 bit (Intel)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>>

Because of the settings you chose during the install, you can now run this interpreter — and, more important, a script — from any directory on your system.

Type exit() and hit Return to exit the interpreter and get back to a C: prompt.

Optional: Set up useful Python packages

Python 3 comes with the package installer pip already in place, which makes it super easy to add useful packages to your Python installation. The syntax is this (replace some_package with a package name you want to install):

pip install package_name_here

Let’s add a couple of must-have utilities for web scraping: Requests and BeautifulSoup. You can use pip to install them all with one command:

pip install beautifulsoup4 requests

csvkit, which I covered here, is a great tool for dealing with comma-delimited text files. Add it:

pip install csvkit

You’re now set to get started using and learning Python under Windows 10. If you’re looking for a guide, start with the Official Python tutorial.


Like What You Read Here?
My book Practical SQL: A Beginner’s Guide to Storytelling with Data from No Starch Press offers a step-by-step guide learning SQL and working with relational databases. Learn how to wrangle the everyday data you encounter to gain meaningful insights.

NoVa-Py Talk: Building a Python Package

One of the most popular uses of the API for DocumentCloud, the document research/publishing platform where I work, is to bulk-upload hundreds or thousands of documents. People usually hack their own code together to do this, sometimes using the Python or Ruby wrappers for the API.

After talking with users and hearing their thoughts about the workflow — a desire to have a record of each file’s URL once uploaded, for example — I saw an opportunity to add some luxury to the process. A couple of months, a lot of research, and a few bruises later, I had my first Python package: pneumatic.

pneumatic does a few things to make life easier. It grabs information about each uploaded file and saves it in a SQLite database, which you can dump to csv. It uses Python’s multiprocessing module to try to add some speed (recognizing that this is a network-bound task). And it scans all subfolders for files, which is handy when you obtain a collection of files organized that way.

Learning about Python packaging was as much a part of the project as creating the library itself. The folks at the Northern Virginia Python Users Group were kind enough to invite me to share what I learned recently. Click through the title card to view the slides.

BaPP

 

Today’s weather in my inbox, via Python

In the category of “potentially useful but mostly just a learning exercise,” here’s a Python script that emails me the local weather report twice a day. I loaded it on a Raspberry Pi my family gave me as a gift last year, set up a cron task, and now each day when I wake up I have a forecast waiting in my inbox. Makes me feel special!

The script — compatible with Python 3.6 and Python 2.7 — uses the awesome Requests library to fetch two endpoints from the Weather Underground API. One provides a forecast, and the other offers a summary of yesterday’s weather. For emailing, it uses the standard Python smtplib.

The code’s available on Github, so fork it and make it your own. You’ll need to have the Requests and simplejson libraries installed. Contributions are welcome!

Here’s a quick overview on how to set it up:

First, you’ll need to sign up for a Weather Underground API key. The free developer level has more than enough calls per day for this app, so choose that unless you plan to obsess about the weather in an oversized manner.

The API key and your email parameters go into a settings.py file:

mail_settings = {
    'address': 'anyone@example.com',
    'pw': 'your-email-password',
    'smtp': 'post.example.com',
    'from': 'Mr. Weather Robot'
}

send_to_addresses = ['someone@example.com', 'someone_else@example.com']

api_key = 'your-wunderground-api-key'

Then, here’s the wx-mail.py file:

import datetime
import smtplib
import requests
import simplejson as json
from email.mime.text import MIMEText
from local_settings import mail_settings, send_to_addresses, api_key


def fetch_forecast(api_key, request_type):
    mail_url = 'http://api.wunderground.com/api/' + api_key + '/' +\
               request_type + '/forecast/q/VA/Leesburg.json'
    r = requests.get(mail_url)
    j = json.loads(r.text)
    return j


def build_html(forecast_json, yesterday_json):
    # build some HTML snippets to open and close this email
    html_open = """\
    <html>
      <head></head>
      <body>
    """
    html_close = """\
      </body>
    </html>
    """

    # let's now build the HTML body contents
    wxdate = forecast_json['forecast']['txt_forecast']['date']
    mail_text = '<h3>Hello, DeBarros family!</h3><p>Here is the ' +\
                'Leesburg, Va., weather forecast as of ' + wxdate + '</p>'
    forecast_length = len(forecast_json['forecast']['txt_forecast']['forecastday']) - 1

    # looping through the JSON object
    for i in range(0, forecast_length):
        cast = '<p><b>' +\
            forecast_json['forecast']['txt_forecast']['forecastday'][i]['title'] +\
            '</b>: ' +\
            forecast_json['forecast']['txt_forecast']['forecastday'][i]['fcttext'] +\
            '</p>'
        mail_text += cast

    # Now, for yesterday's weather summary ...
    # We'll pull the date and some weather data from the summary API endpoint
    summary_date = yesterday_json['history']['dailysummary'][0]['date']['pretty']

    high_low_temp = yesterday_json['history']['dailysummary'][0]['maxtempi'] +\
        ' / ' +\
        yesterday_json['history']['dailysummary'][0]['mintempi'] +\
        ' degrees Fahrenheit'

    max_min_humid = yesterday_json['history']['dailysummary'][0]['maxhumidity'] +\
        '% / ' +\
        yesterday_json['history']['dailysummary'][0]['minhumidity'] + '%'

    precipitation = yesterday_json['history']['dailysummary'][0]['precipi'] +\
        ' inches'

    max_wind_speed = yesterday_json['history']['dailysummary'][0]['maxwspdi'] +\
        ' mph'

    yesterday_html = """\
    <h3>Here's yesterday's weather summary:</h3>
    <p><b>High/low temperature: </b>""" + high_low_temp + '</p>' +\
    '<p><b>Max/min humidity: </b>' + max_min_humid + '</p>' +\
    '<p><b>Precipitation: </b>' + precipitation + '</p>' +\
    '<p><b>Maximum wind speed: </b>' + max_wind_speed + '</p>'

    # put it all together
    html_body = html_open + mail_text + yesterday_html + html_close
    return html_body


def send_email(mail_text):
    # Set the current time and add that to the message subject
    cur_date = datetime.date.today().strftime("%B") +\
        ' ' + datetime.date.today().strftime("%d") +\
        ', ' + datetime.date.today().strftime("%Y")
    subject = 'Family forecast for ' + cur_date

    # Set up the message subject, etc. Then send it.
    COMMASPACE = ', '

    msg = MIMEText(mail_text, 'html')
    msg['Subject'] = subject
    msg['From'] = mail_settings['from']
    msg['To'] = COMMASPACE.join(send_to_addresses)

    server = smtplib.SMTP(mail_settings['smtp'], 25)
    server.login(mail_settings['address'], mail_settings['pw'])
    server.set_debuglevel(1)
    server.sendmail(mail_settings['address'], send_to_addresses,
                    msg.as_string())
    server.quit()


if __name__ == "__main__":
    forecast_json = fetch_forecast(api_key, 'forecast')
    yesterday_json = fetch_forecast(api_key, 'yesterday')
    mail_text = build_html(forecast_json, yesterday_json)
    send_email(mail_text)

The code’s straightforward, but a few things to note:

  • The Python standard smtplib provides all you need for sending the email. Check the official docs for examples.
  • I’ve gotten into the habit of using the simplejson library for wrangling API response objects, but the standard Python json library works just as well.

Have fun, and may all your coding days be sunny and warm.