# Calculating Medians With SQL

Given that median is such a valuable statistical measure, it’s baffling that Microsoft’s SQL Server and other relational databases (MySQL, PostgreSQL) don’t have a built-in MEDIAN function. Well, this week, after working through a data set in SQL Server — and deciding I didn’t want to push the data into SPSS to find medians — I hit the web to find a T-SQL workaround.

I found a ton of solutions (some from people with no clue about the difference between median and average), but the one below — adapted from a post by Adam Machanic at sqlblog.com — was the best. It produces accurate results and is fairly speedy to boot.

Here’s an example. Consider this table with student grades from two courses:
`.`

1 Math Bob 65
2 Math Joe 72
3 Math Sally 95
4 Science Bob 65
5 Science Joe 81
6 Science Sally 81
7 Science Mike 72

We’d like to find the median grade in each class. Here’s the script:

```
DECLARE @tmp TABLE (
ID int identity(1,1),
Class varchar(50),
FirstName varchar(50),
)

INSERT INTO @tmp (Class, FirstName, Grade)
VALUES ('Math', 'Bob', 65)
INSERT INTO @tmp (Class, FirstName, Grade)
VALUES ('Math', 'Joe', 72)
INSERT INTO @tmp (Class, FirstName, Grade)
VALUES ('Math', 'Sally', 95)
INSERT INTO @tmp (Class, FirstName, Grade)
VALUES ('Science', 'Bob', 65)
INSERT INTO @tmp (Class, FirstName, Grade)
VALUES ('Science', 'Joe', 81)
INSERT INTO @tmp (Class, FirstName, Grade)
VALUES ('Science', 'Sally', 81)
INSERT INTO @tmp (Class, FirstName, Grade)
VALUES ('Science', 'Mike', 72)

SELECT
x.Class,
FROM
(
SELECT
Class,
FirstName,
ROW_NUMBER() OVER (
PARTITION BY Class
ORDER BY Grade ASC, ID ASC) AS RowAsc,
ROW_NUMBER() OVER (
PARTITION BY Class
ORDER BY Grade DESC, ID DESC) AS RowDesc
FROM @tmp
) AS x
WHERE
RowAsc IN (RowDesc, RowDesc - 1, RowDesc + 1)
GROUP BY x.Class
ORDER BY x.Class
```

That’s a lot. Let’s break down what’s happening.

First, to set up the data, we’re creating a temporary table and inserting seven rows.

Then comes the main SELECT, which includes a subquery that’s the meat of the goodness behind this. The subquery uses the ROW_NUMBER function to create ascending and descending row identifiers based on the ordering of the grades. It’s easier to visualize if you see what the subquery’s creating:
`.`

Math Sally 95.0 3 1
Math Joe 72.0 2 2
Math Bob 65.0 1 3
Science Sally 81.0 4 1
Science Joe 81.0 3 2
Science Mike 72.0 2 3
Science Bob 65.0 1 4

The RowAsc and RowDesc fields reflect the ordered row numbers of the grades per each class (i.e., partition). Given those, it’s easy to find the median. For the math class, with an odd number of students, the median occurs where RowAsc equals RowDesc. For the science class, with an even number of students, the median is the average of the two grades where RowAsc and RowDesc are within one of each other.

(Whenever you’re looking for the median in an ordered list that has an even number of values, averaging the two middle numbers gives the answer.)

That’s what our main query does when it pulls from the subquery result set. It looks for:

`RowAsc IN (RowDesc, RowDesc - 1, RowDesc + 1)`

and then averages what it finds. The answer is 72 for the math class and 76.5 for science.

This solution works with T-SQL and Microsoft SQL Server. With minor tweaks, it also runs fine in PostgreSQL 8.4, which also implements the ROW_NUMBER function. MySQL does not support that function, so you’ll have to search for another option.

Good stuff and pretty handy for extracting large numbers of median values without having to resort to a stats program.

### 2 responses to “Calculating Medians With SQL”

1. Anthony says:

Here’s the PostgreSQL code:

```CREATE TEMPORARY TABLE tmp (
ID SERIAL,
Class varchar(50),
FirstName varchar(50),
);

INSERT INTO tmp (Class, FirstName, Grade)
VALUES ('Math', 'Bob', 65);
INSERT INTO tmp (Class, FirstName, Grade)
VALUES ('Math', 'Joe', 72);
INSERT INTO tmp (Class, FirstName, Grade)
VALUES ('Math', 'Sally', 95);
INSERT INTO tmp (Class, FirstName, Grade)
VALUES ('Science', 'Bob', 65);
INSERT INTO tmp (Class, FirstName, Grade)
VALUES ('Science', 'Joe', 81);
INSERT INTO tmp (Class, FirstName, Grade)
VALUES ('Science', 'Sally', 81);
INSERT INTO tmp (Class, FirstName, Grade)
VALUES ('Science', 'Mike', 72);

SELECT
x.Class,
FROM
(
SELECT
Class,
FirstName,
ROW_NUMBER() OVER (
PARTITION BY Class
ORDER BY Grade ASC, ID ASC) AS RowAsc,
ROW_NUMBER() OVER (
PARTITION BY Class
ORDER BY Grade DESC, ID DESC) AS RowDesc
FROM tmp
) AS x
WHERE
RowAsc IN (RowDesc, RowDesc - 1, RowDesc + 1)
GROUP BY x.Class
ORDER BY x.Class```
2. Frank says:

Check the PostgreSQL-wiki as well:
http://wiki.postgresql.org/wiki/Aggregate_Median

This site uses Akismet to reduce spam. Learn how your comment data is processed.