Sorting Data in Excel: Simple Analysis
Sorting a data set helps answer a basic question data analysts like to ask: “Which ____ has the highest (or lowest) ______?”
Excel (and other spreadsheets such as the open source Calc) make sorting data easy. In fact, I often make sorting my first step when “interviewing” data because it quickly reveals high and low values and often highlights some that may seem questionable.
Let’s work through a simple sort in Excel. Start by downloading the file “sorting.xlsx” and saving it to your computer. Open it and follow along:
1. We have a table of Census data from the 2006-2008 American Community Survey. It shows the median age of the population for each of 79 school districts in Virginia plus the state itself.
We want to know which district has the oldest and youngest populations. Let’s sort it!
2. Click once on one cell anywhere in the table. This will help Excel auto-discover your table in the next step.
3. On the Excel ribbon, select the “Data” tab and click “Sort.”
4. Two things happened. One, your entire table was selected (or highlighted). Two, a dialog box popped up to offer sorting options. Check off “My data has headers.” That will prevent your header row from getting sorted with the data, and it will add the three column names under the “Sort by” drop down.
5. Under “Sort by,” select “Median.” Under “Order,” select “Largest to Smallest.”
6. Click “OK.” Excel sorts your table, ranking the districts by median age — from highest to lowest. Your first few rows should look like this:
Now, we can do a quick scan and look for patterns. For example, several of the “oldest” counties are in southern Virginia, far away from the Northern Virginia economic engine. Meanwhile, the district with the lowest age is Harrisonburg City Public Schools — with a median age of a barely-legal 22.8. Could the fact that the city hosts two universities have something to do with that?
Good fodder for reporting, all made possible by a simple Excel sort.
A couple of tips and cautions:
— A good general practice is to work on a copy of your original data. Because things happen.
— Excel does best at sorting when your table has a header row and is not contiguous to any unrelated data, such as footnotes. Insert blank rows and columns between the data you want to sort and any information you want to keep separate.
— I recommend selecting only one cell in your table before selecting the “Sort” button. If you grab more than one, Excel may attempt to sort only those cells rather than the whole table. The 2007 version asks if you want to expand the selection, but older versions sometimes do not. This creates the possibility that only some of your data would get sorted, which is a nightmare. Always make sure your entire table gets selected!
— You can sort by more than one field. In Excel 2007, click “Add level” in the sort dialog.
Questions? Tips of your own? Add them below …