Data Journalism and the Big Picture

The web-o-sphere this week brought forth a collection of opinions on the value of data journalism and the skills that go with it. To wit:

  • Tim Berners-Lee, he who invented the World Wide Web, told the Guardian that “journalists need to be data-savvy” and that “data-driven journalism is the future.” The story then goes on to question whether data analysis could ever replace traditional reporting.
  • The blog 10,000 Words declared that one of the “5 Myths about digital journalism” is that “journalists must have database development skills” and suggested that most journalists should leave high-level hacking to the experts.
  • Another site, FleetStreetBlues, opined that “amidst all this hype, earnestness and spreadsheet-geekery, here’s the truth about so-called ‘data journalism’. It’s still about the story, stupid.”

There’s been a bunch of reaction to these posts, including a few people pointing out a 1986 Time story that sounds similar to the one this week from the Guardian. And therein lies the problem with all three pieces: None of them benefits from a big-picture, historical perspective on data journalism — not where it came from, not how it’s changed and especially not the massive amount of ground the label covers these days.

We used to call it CAR

Back  when software came on 5.25-inch floppy disks, or maybe before then, the idea of using a PC to “crunch numbers” was christened “computer-assisted reporting.” These days, we call it data journalism because, along the way, it became obvious the old name was anachronistic. As Phil Meyer once said, we don’t talk about telephone-assisted reporting, do we?

When I got into the game — when Paradox was the desktop database manager of choice — our newsroom had a personal computer designated as the “CAR station.” While others worked on dumb terminals connected to a mainframe, I was surfing the web with Netscape and ringing up Paul Overberg for advice on Census data. I was the newsroom data expert — the guy reporters called when they had a spreadsheet on a disk or an idea to get data from city hall.

In that era — with database-driven web startups like spreading cultural revolution — it was easy to foresee a time when reporters wouldn’t just get the occasional spreadsheet but find themselves inundated with data. Thus was born (at least in my sphere) the drive to evangelize CAR in the newsroom. We taught Excel, we sent people to IRE boot camps, we set up presentations showing the kinds of stories journalists were landing with these skills. The message of CAR was about finding stories and using simple tools to do it: spreadsheets, databases, maps, stats.

Now we call it hacking

Soon enough, though, the craft began to change and so did the talk at IRE CAR conferences — especially in the hands-on classes and demos. In Philadelphia in 2002, the hands-on classes mostly covered Access, Excel, SPSS and, for the adventurous, SQL Server. Just a few years later, in Cleveland and Houston, the offerings included sessions on web scraping, Perl, Python, MySQL and Django.

Speed Up Your WordPress Site

WordPress is a great CMS, but if you’re hosting your own install you’ll want to pay attention to keeping its performance nimble. Here are a few tips I picked up this week after reading this post and thinking about how to apply those ideas to my own site:

1. Optimize the database.

WordPress handily saves every revision of every page and post you create. That’s great for retrieving a lost paragraph, but it can add hundreds of rows to the wp_posts table — especially if, like me, you save revisions often. Eventually, the database grows big enough to affect query performance, which can slow page loads.

There’s a remedy: Delete those unnecessary rows and optimize your tables.

The WordPress plugin directory lists several database optimizers, but they make me nervous. They all claim to optimize but don’t say exactly what changes they make. Instead, I dug straight into my site’s MySQL database using my web host’s phpMyAdmin interface.

Sanity check: Back up your database first.

Once that’s done, run this in the SQL pane:

DELETE FROM wp_posts WHERE post_type = "revision";

Next, use phpMyAdmin to optimize your tables (Structure Pane > Select All > choose Optimize Table from the drop down).

This trick shrank my database from 5.5MB to less than 1MB.

2. Install a caching plugin.

If something I write here goes viral, I’d like to avoid watching my server fry. One way to head this off is to set up caching so that every page load isn’t executing a query on your server.

Again, there is no shortage of caching plug-ins for WordPress. After checking comments on my web host’s user forums, I went with W3 Total Cache. Not only does it provide caching for pages, feeds and other content, it will minify CSS, HTML and JavaScript.

It loads like any WordPress plugin, but it has many options. This tutorial gives great step-by-step directions.

3. Remove unneeded PHP calls

Most WordPress themes, including the one I built for my site, include calls designed to fetch info from the MySQL database. The idea is to make the theme easily customizable via the admin and make the theme portable to millions of users.

But unless you plan to change your site’s name, tagline or url often, there’s no reason to have your theme fetch those every time it loads your header.

For example, in my header.php file, I changed this:

<h1><a href="<?php bloginfo('url'); ?>"><?php bloginfo('name'); ?></a></h1>

To this:

<h1><a href="">Anthony DeBarros</a></h1>

That’s two fewer calls each time that page loads. Depending on your theme, you might be able to find many more calls to eliminate.

More reading:

There’s a lot more you can tweak under the hood, and that means plenty of chances to learn more goodness about server configurations. Here are some of the places where I picked up these tricks and where you can find a lot more detail. Happy optimizing!

How to Bottle Your Conference Glow

If, like me, you follow journalists on Twitter, then you’re familiar with the monthly river of posts with hashtags from the latest industry conference. Journalism conferences are like revival meetings minus any talk of the Divine. The faithful leave with a great glow — which lasts all the way till you get back to your desk, back to Inbox 2,000 and the note from your boss saying, “Please come to my office.”

Doesn’t have to end that way. Here’s how to bottle that glow and make it last till next year:

Take good notes: While you’re at the conference, email yourself a daily note with links, ideas, names of people to follow on Twitter, books to read and skills to learn.

Set goals: Compile your notes into a learning agenda. List three to five items to pursue between now and the next year’s conference. Examples might include launching your own web site, learning a programming skill, or lifting your writing skills.

Make a reading list: Use your favorite feed reader to follow blogs and sites related to the skills you’re learning.

Revisit the agenda: Six months after the conference ends, go back and look over the agenda. Catch up on what the presenters of your favorite sessions have been up to.

Make vital connections: Go beyond following people on Twitter and seek out people to network with. Share your ideas, struggles, wins. Look for meetups in your area or opportunities to coach others to share your skills.

Those are my ideas. Yours?