Generate JSON From SQL Using Python

Author’s note: This post has been updated from its original 2012 version to use PostgreSQL and the Python psycopg2 library. Also, if you like what you read here, please check out my book Practical SQL: A Beginner’s Guide to Storytelling with Data from No Starch Press.


Let’s say you want to generate a few hundred — or even a thousand — flat JSON files from a SQL database. Maybe you want to power a data visualization but have neither the time nor the desire to spin up a server to dynamically generate the data. So, you want flat files, each one small for quick loading. And a lot of them.

A few lines of Python is all you need.

I’ve gone this route for a few data-driven visuals, creating JSON files out of large database tables. Python works well for this, with its JSON encoder/decoder offering a flexible set of tools for converting Python objects to JSON. Let’s see how this works.

The Data

I’m using PostgreSQL for this tutorial. Here’s a script that creates a table and fills it with two rows of data:

CREATE TABLE students_test (
    id integer,
    first_name text,
    last_name text,
    street text,
    city text,
    st text,
    zip text
);
    
INSERT INTO students_test VALUES
(1, 'Samantha', 'Baker', '9 Main St.', 'Hyde Park', 'NY', '12538'),
(2, 'Mark', 'Salomon', '3 Stavesacre Terrace', 'Dallas', 'TX', '75204');

The Script

Here’s an example Pythonscript that generates two JSON files from that query. One file contains JSON row arrays, and the other has JSON key-value objects. Below, we’ll walk through it step-by-step.

Continue…

Analyzing Shapefile Data with PostgreSQL

This post is adapted from material in my book Practical SQL: A Beginner’s Guide to Storytelling with Data from No Starch Press.


Spend some time digging into geographic information systems (GIS) and soon enough you’ll encounter a shapefile. It’s a GIS file type developed by mapping software firm Esri for use in its popular ArcGIS platform. A shapefile contains the geometric information to describe a shape—a river, road, lake, or town boundary, for example—plus metadata about the shape, such as its name.

Because the shapefile has become a de facto standard for publishing GIS data, other applications and software libraries use shapefiles too, such as the open source QGIS.

While researching GIS topics for a chapter in my book, Practical SQL, I learned that it’s easy to import a shapefile into a PostGIS-enabled PostgreSQL database. The information that describes each shape is stored in a column of data type geometry, and so you can run spatial queries to calculate area, distances, intersections of objects, and more.

Here’s a quick exercise, adapted from the book.

Continue…

‘Practical SQL’ Book in Early Release

My first book, Practical SQL: A Beginner’s Guide to Storytelling with Data, is out in early release from No Starch Press starting today! If you pre-order from No Starch, you can download the Introduction and first four chapters now. You’ll get additional chapters regularly until the final version comes out in February 2018.

Practical SQL is for people who encounter data in their everyday lives and want to know how to analyze or transform it. The book covers real-world data and scenarios, from analyzing U.S. Census demographics to the duration of taxi rides in New York City. I’ve aimed the exercises at beginning SQL coders, and all the code and data can be downloaded via No Starch’s site.

That database you’ll use is the free, open-source PostgreSQL, along with the pgAdmin 4 graphical user interface. We cover all the basics you’ll find in standard ANSI SQL along with PostgreSQL-specific features such as full text search and GIS.

More to come as additional chapters hit early release!

App launch: 2014 elections forecast

Election Forecast

 

With more than 1,300 candidates, 507 races, top-line campaign finance data and poll averages for select races, the 2014 midterm elections forecast app we launched in early September is probably the most complex mash-up of data, APIs and home-grown content built yet by our Interactive Applications team at Gannett Digital.

We’re happy with the results — even more because the app is live not only at USA TODAY’s desktop and mobile websites but across Gannett. With the rollout of a company-wide web framework this year, we’re able to publish simultaneously to sites ranging from the Indianapolis Star to my alma mater, The Poughkeepsie Journal.

What’s in the forecast? Every U.S. House and Senate race plus the 36 gubernatorial races up in November with bios, photos, total receipts and current poll averages. For each race, USA TODAY’s politics team weighed in on a forecast for how it will likely swing in November. Check out the Iowa Senate for an example of a race detail page.

Finally, depending on whether you open the app with a desktop, tablet or phone, you’ll get a version specifically designed for that device. Mobile-first was our guiding principle.

Building the backend

This was a complex project with heavy lifts both on design/development and data/backend coding. As usual, I handled the data/server side for our team with assists from Sarah Frostenson.

As source data, I used three APIs plus home-grown content:

— The Project Vote Smart API supplies all the candidate names, party affiliations and professional, educational and political experience. Most of the photos are via Vote Smart, though we supplemented where missing.

— The Sunlight Foundation’s Realtime Influence Explorer API supplies total receipts for House and Senate candidates via the Federal Election Commission.

— From Real Clear Politics, we’re fetching polling averages and projections for the House (USAT’s politics team is providing governor and Senate projections).

The route from APIs to the JSON files that fuel the Backbone.js-powered app goes something like this:

  1. Python scrapers fetch data into Postgres, running on an Amazon EC2 Linux box.
  2. A basic Django app lets the USAT politics team write race summaries, projections and other text. Postgres is the DB here also.
  3. Python scripts query Postgres and spits out the JSON files, combining all the data for various views.
  4. We upload those files to a cached file server, so we’re never dynamically hitting a database.

Meanwhile, at the front

Front-end work was a mix of data-viz and app framework lifting. For the maps and balance-of-power bars, Maureen Linke (now at AP) and Amanda Kirby used D3.js. Getting data viz to work well across mobile and desktop is a chore, and Amanda in particular spent a chunk of time getting the polling and campaign finance bar charts to flex well across platforms.

For the app itself, Jon Dang and Rob Berthold — working from a design by Kristin DeRamus — used Backbone.js for URL routing and views. Rob also wrote a custom search tool to let readers quickly find candidates. Everything then was loaded into a basic template in our company CMS.

This one featured a lot of moving parts, and anyone who’s done elections knows there always are the edge cases that make life interesting. In the end, though, I’m proud of what we pulled off — and really happy to serve readers valuable info to help them decide at the polls in November.

Calculating Medians With SQL

Given that median is such a valuable statistical measure, it’s baffling that Microsoft’s SQL Server and other relational databases (MySQL, PostgreSQL) don’t have a built-in MEDIAN function. Well, this week, after working through a data set in SQL Server — and deciding I didn’t want to push the data into SPSS to find medians — I hit the web to find a T-SQL workaround.

I found a ton of solutions (some from people with no clue about the difference between median and average), but the one below — adapted from a post by Adam Machanic at sqlblog.com — was the best. It produces accurate results and is fairly speedy to boot.

Here’s an example. Consider this table with student grades from two courses:
.

ID Class FirstName Grade
1 Math Bob 65
2 Math Joe 72
3 Math Sally 95
4 Science Bob 65
5 Science Joe 81
6 Science Sally 81
7 Science Mike 72

We’d like to find the median grade in each class. Here’s the script:
Continue…