SQL Window Functions

SQL > Window Functions

What is a window function in SQL?

With aggregate functions, you usually get a single result for each group that you look at (as specified by the columns in the GROUP BY clause). However, there are times when you want to apply the aggregate function to each row, and this is where window functions come in.

Window functions are called such because they operate on each row within a window. A window is defined by the OVER() clause. There are two major components in the OVER() construct:

PARTITION BY: Defines the window partition into groups of rows. This is similar to the GROUP BY clause. Note that this is optional. You do not need to specify PARTITION BY if your window covers the entire data set.
ORDER BY: Orders the value within each window partition. This is optional, although in most of the use cases, it makes sense to order your results in some way.

There are a number of aggregate functions that can act as a window function. The following are the common aggregate functions used as part of a window function:

Common aggregate functions: SUM, AVG, COUNT
Ranking functions: RANK, DENSE_RANK, ROW_NUMBER
Grouping function: NTILE
Before- and after-functions: LEAD, LAG

We go through each type of functions below:

Common Aggregate Functions: SUM, AVG, COUNT

We use the following table for our examples.

Table Store_Sales

Store_ID	Salesperson	Sales
1	Aaron	374
1	Beatrice	492
1	Cathy	430
2	Dan	462
2	Elmo	747
2	Frank	1332
2	Gina	898
2	Harry	603
3	Isabel	247
3	Jimmy	1030
3	Kara	1030
3	Lamar	1314
3	Mary	462

To list the average sales amount of the store each salesperson belongs to, we use the following window function:

SELECT Store_ID, Salesperson, Sales, AVG(Sales) OVER (PARTITION BY Store_ID) Avg_Store
FROM Store_Sales;

Result:

Store_ID	Salesperson	Sales	Avg_Store
1	Aaron	374	432
1	Beatrice	492	432
1	Cathy	430	432
2	Dan	462	808.4
2	Elmo	747	808.4
2	Frank	1332	808.4
2	Gina	898	808.4
2	Harry	603	808.4
3	Isabel	247	816.6
3	Jimmy	1030	816.6
3	Kara	1030	816.6
3	Lamar	1314	816.6
3	Mary	462	816.6

Ranking Functions: RANK, DENSE_RANK, ROW_NUMBER

ROW_NUMBER(), RANK(), and DENSE_RANK() are all functions that show the rank of a row within the window. These three functions are different in how they handle ties:

RANK(): When there are ties, this function will consider the number of rows that are tied and assign the rank to the subsequent rows. For example, if two rows are tied as rank 1, the next row will get rank 3.

DENSE_RANK(): When there are ties, this function will simply assign the next rank to the next row after the tie. For example, if two rows are tied as rank 1, the next row will get rank 2.

ROW_NUMBER(): Does not consider ties. This simply returns the row number of the result set within the window.

When there is no tie among the values, these three functions all return the same result set.

Here we use the same Store_Sales table as above. To list the rank of each salesperson by sales amount within each store, we use the following window function:

SELECT Store_ID, Salesperson, Sales, RANK() OVER (PARTITION BY Store_ID ORDER BY Sales DESC) Sales_Rank
FROM Store_Sales
ORDER BY Store_ID, Sales_Rank;

Result:

Store_ID	Salesperson	Sales	Sales_Rank
1	Beatrice	492	1
1	Cathy	430	2
1	Aaron	374	3
2	Frank	1332	1
2	Gina	898	2
2	Elmo	747	3
2	Harry	603	4
2	Dan	462	5
3	Lamar	1314	1
3	Jimmy	1030	2
3	Kara	1030	2
3	Mary	462	4
3	Isabel	247	5

If we replace RANK() with DENSE_RANK() above, we get the following:

SELECT Store_ID, Salesperson, Sales, DENSE_RANK() OVER (PARTITION BY Store_ID ORDER BY Sales DESC) Sales_Dense_Rank
FROM Store_Sales
ORDER BY Store_ID, Sales_Rank;

Result:

Store_ID	Salesperson	Sales	Sales_Dense_Rank
1	Beatrice	492	1
1	Cathy	430	2
1	Aaron	374	3
2	Frank	1332	1
2	Gina	898	2
2	Elmo	747	3
2	Harry	603	4
2	Dan	462	5
3	Lamar	1314	1
3	Jimmy	1030	2
3	Kara	1030	2
3	Mary	462	3
3	Isabel	247	4

If we replace RANK() with ROW_NUMBER() above, we get the following:

SELECT Store_ID, Salesperson, Sales, ROW_NUMBER() OVER (PARTITION BY Store_ID ORDER BY Sales DESC) Sales_Row_Number
FROM Store_Sales
ORDER BY Store_ID, Sales_Rank;

Result:

Store_ID	Salesperson	Sales	Sales_Row_Number
1	Beatrice	492	1
1	Cathy	430	2
1	Aaron	374	3
2	Frank	1332	1
2	Gina	898	2
2	Elmo	747	3
2	Harry	603	4
2	Dan	462	5
3	Lamar	1314	1
3	Jimmy	1030	2
3	Kara	1030	3
3	Mary	462	4
3	Isabel	247	5

Notice the different values for Store_ID = 3. For this store, two salespeople are tied for the second most sales. Under RANK(), the five values are 1,2,2,4,5. Under DENSE_RANK(), the five values are 1,2,2,3,4. Under ROW_NUMBER(), the five values are 1,2,3,4,5.

Grouping Function: NTILE

The NTILE() function divides all rows in a window into a pre-defined number of groups of approximately the same size. The first group is assigned 1, the second group is assigned 2, etc.

For example, if there are 100 rows in a window and we use NTILE(5), then each NTILE value would correspond to 20 rows.

If the number of rows is not divisible by the number of groups, then the size of each group can vary by 1, with the earlier groups having the larger count.

For example, if there are 99 rows in a window and we use NTILE(5), then the first 4 groups will have 20 rows, while the last group will have 19 rows.

Let's look at an example. If we want to divide all salesperson in our example into four groups based on the amount of sales each generates regardless of which store the salesperson belongs to, we would use the following query:

SELECT Store_ID, Salesperson, Sales, NTILE(4) OVER (ORDER BY Sales DESC) Sales_Group
FROM Store_Sales
ORDER BY Sales DESC;

Result:

Store_ID	Salesperson	Sales	Sales_Group
2	Frank	1332	1
3	Lamar	1314	1
3	Jimmy	1030	1
3	Kara	1030	1
2	Gina	898	2
2	Harry	603	2
2	Elmo	747	2
1	Beatrice	492	3
2	Dan	462	3
3	Mary	462	3
1	Cathy	430	4
1	Aaron	374	4
3	Isabel	247	4

Here we have a total of 13 rows and we are trying to group them into four groups. So Group 1 gets 4 rows, and Groups 2-4 each get 3 rows.

Before- and After-Functions: LEAD, LAG

There are instances where you are interested in doing a calculation based on the row(s) prior to the particular row you are interested in or based on the row(s) after the particular row you are interested in. In these cases, the LEAD() and LAG() functions can be used.

The LAG() function takes the value of the row(s) above the current row. The syntax of the LAG() function is as follows:

LAG(expression [, interval [, default_value]]) OVER ()

The interval argument is optional, and it specifies the number of rows to skip. If this is not specified, then it is set to 1. The default_value argument is only used when interval is specified, and you can set the default value to be returned if the function generates NULL.

The LEAD() function takes the value of the row(s) below the current row. The syntax of the LEAD() function is as follows:

LEAD(expression [, interval [, default_value]]) OVER ()

Let's look at an example with the following query:

SELECT Store_ID, Salesperson, Sales, LEAD(Sales) OVER (PARTITION BY Store_ID ORDER BY Sales DESC) Sales_Lead, LAG(Sales) OVER (PARTITION BY Store_ID ORDER BY Sales DESC) Sales_Lag
FROM Store_Sales
ORDER BY Store_ID, Sales DESC;

Result:

Store_ID	Salesperson	Sales	Sales_Lead	Sales_Lag
1	Beatrice	492	430	NULL
1	Cathy	430	374	492
1	Aaron	374	NULL	430
2	Frank	1332	898	NULL
2	Gina	898	747	1332
2	Elmo	747	603	898
2	Harry	603	462	747
2	Dan	462	NULL	603
3	Lamar	1314	1030	NULL
3	Jimmy	1030	1030	1314
3	Kara	1030	462	1030
3	Mary	462	247	1030
3	Isabel	247	NULL	462

Let's look at the data for Cathy. In the Sales_Lead column, the amount is 374, which corresponds to the Sales amount of Aaron, who ranks right below Cathy. In the Sales_Lag column, the amount is 492, which corresponds to the Sales amount of Beatrice, who ranks right above Cathy.

The above query itself may not look very useful. However, the usefulness becomes more evident if we are trying to see how much difference there is between each person and the person who ranks right above. In this case, we would use the following query:

SELECT Store_ID, Salesperson, Sales, LAG(Sales,1,Sales) OVER (PARTITION BY Store_ID ORDER BY Sales DESC) - Sales Sales_Difference FROM Store_Sales
ORDER BY Store_ID, Sales DESC;

Result:

Store_ID	Salesperson	Sales	Sales_Difference
1	Beatrice	492	0
1	Cathy	430	62
1	Aaron	374	56
2	Frank	1332	0
2	Gina	898	434
2	Elmo	747	151
2	Harry	603	144
2	Dan	462	141
3	Lamar	1314	0
3	Jimmy	1030	284
3	Kara	1030	0
3	Mary	462	568
3	Isabel	247	215

Next: SQL DECODE

This page was last updated on October 12, 2024.

What is a window function in SQL?

Common Aggregate Functions: SUM, AVG, COUNT

Ranking Functions: RANK, DENSE_RANK, ROW_NUMBER

Grouping Function: NTILE

Before- and After-Functions: LEAD, LAG

AdBlock Detected!