Data journalists spend lots of time wrestling dirty data, so when I heard the News Applications team at the Chicago Tribune raving about the data-handling abilities of Freebase Gridworks, my interest was piqued. Anything that can lessen the pain of cleaning data is worth a closer look!
Freebase Gridworks is a Java-based app that runs locally in your web browser. The makers’ pitch describes it best:
… A power tool that allows you to load data, understand it, clean it up, reconcile it internally, augment it with data coming from Freebase, and optionally contribute your data to Freebase for others to use. All in the comfort and privacy of your own computer.
Installation is simple. I chose to load Gridworks on my Windows XP-based work laptop, although you can download Mac and Linux versions from the code page. I was up and running in about five minutes, which included loading a new version of Java. Once running, the opening screen looks like so (click for larger version):
You can open an existing project or create a new one by importing a data file — and Gridworks hints at its utility by providing options to parse delimited or non-delimited files, limit the import to specific rows, etc. For testing, I grabbed the Academic Libraries: 2008 Public Use Data file from the National Center for Education Statistics — a tab-delimited text file of about 4,100 rows.