The European website LearnSQL.com recently published an interview with me in advance of the upcoming release of the 2nd Edition of Practical SQL. Journalist-turned-programmer/trainer Jakub Romanowski asked questions ranging from what’s new in the 2nd Edition to why the book uses PostgreSQL.
Along the way, I share some of my philosophy about writing, data, and the value of learning SQL.
Here’s an excerpt:
Most SQL books are pretty boring, like a different form of database documentation with some unrealistic examples. You’ve made learning SQL syntax interesting. What’s your secret?
Thanks for saying that! I write from the perspective of a teacher in a classroom. How would I keep a group of students engaged each week for a whole semester? To me, the answer is to have the book focus on real-life data analysis, with the joy of discovery and sometimes the pain of messy data getting in the way. So, I use real data and try to help readers discover insights about the data while they’re learning. And, as I’m sure you know, data can sometimes be lacking in terms of quality. We learn about that too, and how to deal with it. That’s very real.
pgAdmin, the free, open-source GUI for the PostgreSQL database, has come a long way since version 1.0 released in September 2016. While writing a new edition of Practical SQL, five cool pgAdmin features and tips caught my eye. They include theme options, a Geometry Viewer, a JSON editor, SQL formatter, and import/export wizard.
Let’s take a look at each. For this description, I’m referencing pgAdmin version 6.0, released in October 2021.
For those who wish to reduce screen glare, pgAdmin features the addition of a dark theme as of version 4.15. To access this pgAdmin feature, select File > Preferences > Miscellaneous > Themes, then choose Dark. A high contrast theme is also available.
If you’re a PostGIS user, one of the handiest pgAdmin features is its built-in Geometry Viewer. Instead of exporting data to a spatial analysis product like QGIS, you can get a quick view of it right in pgAdmin.
Using the Geometry Viewer is straightforward. Run a query that contains a spatial data column in the output. Then click the eye icon in the column header to open the Geometry Viewer and show your shapes plotted on an OpenStreetMap layer:
The example, from the “Working with JSON Data” chapter in Practical SQL (code here), shows a selection of earthquakes near Tulsa, Oklahoma. As long as your data is set with a spatial reference identifier of 4326 (WGS 84), the features will display on the OSM layer; otherwise they’ll show on a blank canvas.
The map is interactive. Click any displayed feature to show other data from the row in the source query output. It’s handy for quick explorations.
There’s no standard way to format SQL, and every coder tends to have their own style. Uppercase, lowercase, indents — it generally comes down to personal preference. Still, for the sake of readability, it helps to have a consistent style. A pgAdmin feature called “Format SQL” analyzes your code and reformats it according to settings you customize.
To see the options, select File > Preferences > Query Tool > SQL formatting. For this example, I chose lowercase identifiers, uppercase keywords, re-indentation, use spaces, and no comma-first notation. Then, I wrote a simple query:
Under the editing menu icon, choose the last item, “Format SQL.” With my settings, it produced these results:
Keywords have been uppercased and the code aligned into clauses. Nice.
You’ll likely need to play with the settings and try them out on different types of queries to find a style that works for you.
With PostgreSQL’s relatively new support for JSON data, pgAdmin has recently added more robust viewing and editing abilities. Double clicking on a column with data type JSON or JSONB opens a compact viewing/editing tool:
The tool includes a JSON formatter, a minify option, and several view modes. Read the documentation for details.
PostgreSQL users are familiar with the COPY statement for importing and exporting data. Sometimes, though, COPY is impractical — for example, when you’re working with a server in the cloud. That’s because COPY can only access the local file system. You’d have to first transfer a file to the cloud server, then run COPY. That’s not always practical.
Users often get around this limitation by employing the psql command-line utility and its \copy command, which can pipe a file on your computer to the server. pgAdmin’s import/export wizard puts a friendly GUI face on top of \copy, removing the need to remember some of its arcane commands. For example:
Right-click a table name in pgAdmin’s object browser and select “Import/Export.” At the top, move the slider to choose which type of operation, then select options and your source file. When you click “OK,” pgAdmin runs \copy in the background to perform the operation.
The second edition has been thoroughly revised with updated data, additional query techniques, and two new chapters: an expanded set of instructions on for setting up your system plus a chapter on using PostgreSQL with the popular JSON data interchange format.
No Starch Press has announced a January 2022 publication for the 2nd Edition of Practical SQL: A Beginner’s Guide to Storytelling with Data. The book’s new page is up with a sample chapter and a link to pre-order.
In this second edition, I’ve added two new chapters. One covers working with JSON in PostgreSQL; it’s designed to introduce readers to the basics of JSON and PostgreSQL’s JSON data types and operators. For analysis, we work with a set of earthquake data from the U.S. Geological Survey’s JSON API. A second new chapter offers an expanded guide to setting up your computer for learning SQL, from PostgreSQL installation to downloading the code examples and using a text editor.
I’ve expanded many chapters to add new sections on, among other things, LATERAL joins, rolling averages, materialized views, set operators, and more. Where possible, I’ve updated data sets—particularly from the U.S. Census. And I’ve benefited from a detailed technical review. All told, this new version of the book is more complete, offers stronger guidance for readers related to software and code, and clarifies information that wasn’t as clear or presented as accurately as it could have been the first time around.
The first edition of Practical SQL is on its sixth printing and has been translated into Polish, Korean, and traditional Chinese. I’m eager to see the updated, expanded, and improved version in stores soon.
Powering an application from a collection of static files rather than a call to an API or database server is well worth considering. Removing a production database and data access layer from the mix eliminates complexity and trims development time. Static files aren’t right for every situation, but for small apps they may be all you need.
In my days building back ends for interactive news applications, most of our apps consumed JSON. Simple apps ran off one or two static JSON files. The more complex used hundreds. Updating content was as easy as generating new JSON files and uploading them to a file server.
I’ve written about how to use Python to generate JSON files from a SQL database. But it’s just as easy to use Python to make JSON out of Excel spreadsheets. The key ingredient is the Python library openpyxl, which offers a collection of functions for reading and writing Excel .xlsx workbooks. You can add it to your Python environment by running pip install openpyxl.
Let’s look at how it works, then use with some Python to generate JSON.
Basic openpyxl functions
Let’s say we have an Excel workbook containing a small table repeated over three worksheets. The table in each case looks like this:
Here are some snippets of code — just scratching the surface — to interact with it programmatically:
# Demonstrates basic openpyxl functions for working with Excel files
# (.xlsx format)
from openpyxl import load_workbook
# load the workbook
wb = load_workbook('excel-xlrd-sample.xlsx')
# Print the sheet names
# Get a sheet by name
sheet = wb['Cars20']
# Iterate through sheet rows, returning each as a tuple:
for row in sheet.values:
# If you just want the first column:
for cell in sheet['A']:
# Index individual cells:
cell_c4 = sheet['C4'].value
# Or you can use:
cell_c4 = sheet.cell(row=4, column=3).value
# Let's say you want the same cell every sheet in a workbook:
for sheet in wb.sheetnames:
open_sheet = wb[sheet]
cell_x = open_sheet['D2'].value
From Excel to JSON
Pretty cool stuff. Now, let’s convert our sample spreadsheet to JSON. I’ll borrow some of the techniques I discussed when outlining how to use Python to build JSON from a SQL database:
from collections import OrderedDict
from itertools import islice
from openpyxl import load_workbook
# Open the workbook and select a worksheet
wb = load_workbook('excel-xlrd-sample.xlsx')
sheet = wb['Cars20']
# List to hold dictionaries
cars_list = 
# Iterate through each row in worksheet and fetch values into dict
for row in islice(sheet.values, 1, sheet.max_row):
cars = OrderedDict()
cars['car-id'] = row
cars['make'] = row
cars['model'] = row
cars['miles'] = row
# Serialize the list of dicts to JSON
j = json.dumps(cars_list)
# Write to file
with open('data.json', 'w') as f:
Here’s the breakdown: We open the workbook, select a sheet, and set up a blank list to hold rows of data. Then we use Python’s islice tool to iterate through the rows in the sheet, skipping the header and stopping at the max_row property of the sheet.
For each row, we add each cell to a key/value pair in a dictionary, then add each dictionary to a list. Then we dump the list to JSON and write to a file.
Of course, a spreadsheet this simple doesn’t need a Python script to make its way to JSON. Just use Mr. Data Converter. But as soon as your JSON requirements gain complexity, the ability to use Python to nest nodes, build strings and transform data on the fly make this approach very appealing.