SQL > Advanced SQL >
Median refers to the "middle" number in a series of numbers. When the total count is odd, this is pretty straightforward. Assuming there are n numbers, the median number would be the (n+1)/2-th largest number (or the (n+1)/2-th smallest number -- both are the same number). For example, if there are a total of 7 numbers, the median refers to the 4th largest (or the 4th smallest) number.
When the total count is even, there is no number that represents the exact "middle" number. In this case, we would take the average of the two numbers closest to the middle. Assuming there are n numbers, the median would be the average of the n/2-th largest number and the (n+2)/2-th largest number (You can also think of it as the average of the n/2-th smallest number and the (n+2)/2-th smallest number -- these two give the same result). For example, if there are a total of 8 numbers, the median would be the average of the 4th (which represents n/2) and the 5th (which represents (n+2)/2) largest numbers.
Given the above definition, our strategy for calculating the median is as follows:
This is a fairly complex query as we will need to include a number of elements:
An Example of Calculating Median
Let's use an example to illustrate. Say we have the following table,
the SQL we would use is,
With Count_Total as
(SELECT COUNT(*) row_count from Total_Sales)
CASE WHEN row_count % 2 = 1 THEN (row_count+1)/2 ELSE row_count/2 END as begin_count,
CASE WHEN row_count % 2 = 1 THEN (row_count+1)/2 ELSE (row_count+2)/2 END as end_count
SELECT AVG(Sales) Median from
(SELECT Name, Sales, ROW_NUMBER() OVER (ORDER BY Sales ASC) RK
WHERE RK BETWEEN (SELECT begin_count FROM From_To) and (SELECT end_count FROM From_To);
21 is the average of 22, the third largest number, and 20, the fourth largest number.
Explanation of SQL Code
The first CTE, Count_Total, calculates the total number of rows in the table.
The second CTE, From_To, calculates the higher rank and the lower rank values that we will use to calculate the median.
The main SQL statement uses a window function with ROW_NUMBER() to get the rank (represented by RK) of each row in ascending order. ROW_NUMBER() is used here because we actually do not want to assign the same rank in case of a tie. Then the WHERE clause uses two inline views to determine which two numbers we should take the average of for the median.
Second Example of Calculating Median
Let's change the table content by adding a row so the table is as follows:
Now the above SQL would yield the following result:
In this case 22 is the 4th largest number and also the 4th smallest number in the table, hence it is the median.
List of SQL Complex Operations