Window functions

cume_dist()
dense_rank()
first_value()
lag()
last_value()
lead()
nth_value()
ntile()
percent_rank()
rank()
row_number()

Window functions are functions that perform calculations for a set of rows that are related to the current row in some way. Unlike aggregate functions, values calculated by window functions can be inserted into each row of the source table.

A window function call always contains an OVER expression following the window function name and arguments. The OVER expression specifies exactly how the rows of the data set are to be divided for processing by the window function.

The OVER expression may be supplemented by the PARTITION BY expression, which divides the rows of the data set into groups, combining the rows into groups by matching the values of the specified columns. The value of the window function is calculated by the rows that fall into the same group as the current row.

`cume_dist()`

Description	Cumulative allocation.
Usage	`cume_dist([ORDER BY column])`

Cumulative distribution: the number of rows in the group preceding the current row or with the same value as the current row, divided by the total number of rows in the group.

If the ORDER BY condition is specified, the value is calculated using the specified order.

See example

CREATE TABLE t(gr VARCHAR, num BIGINT);

INSERT INTO t VALUES
    ('A', 1),
    ('A', 2),
    ('A', 2),
    ('A', 3),
    ('B', 4);

SELECT gr, num,
    cume_dist(ORDER BY num) OVER (PARTITION BY gr) AS cume_dist
    FROM t ORDER BY num;

+-------+--------+-----------+
| gr    | num    | cume_dist |
+-------+--------+-----------+
| A     | 1      | 0.25      |
+-------+--------+-----------+
| A     | 2      | 0.75      |
+-------+--------+-----------+
| A     | 2      | 0.75      |
+-------+--------+-----------+
| A     | 3      | 1         |
+-------+--------+-----------+
| B     | 4      | 1         |
+-------+--------+-----------+

`dense_rank()`

Description	The rank of the current string within the group without skips.
Usage	`dense_rank()`
Aliases	`rank_dense()`

Returns the rank of the current string within a group without skips. Essentially this function counts groups of rows with matching values on a given column and assigns them numbers within the group.

See example

CREATE TABLE t(gr VARCHAR, num BIGINT);

INSERT INTO t VALUES
    ('A', 1),
    ('A', 2),
    ('A', 2),
    ('A', 5),
    ('B', 4);

SELECT gr, num,
    dense_rank() OVER (PARTITION BY gr ORDER BY num) AS dense_rank
    FROM t ORDER BY gr, num;

+----+--------+------------+
| gr | num    | dense_rank |
+----+--------+------------+
| A  | 1      | 1          |
+----+--------+------------+
| A  | 2      | 2          |
+----+--------+------------+
| A  | 2      | 2          |
+----+--------+------------+
| A  | 5      | 3          |
+----+--------+------------+
| B  | 4      | 1          |
+----+--------+------------+

`first_value()`

Description	Returns the value calculated using the specified expression for the first row of the group.
Usage	`first_value(expression[ ORDER BY column][ IGNORE NULLS])`.

If the ORDER BY condition is specified, the value is calculated using the specified order.

Parameters

IGNORE NULLS — the value is calculated excluding rows with NULL value.

See example

CREATE TABLE t(gr VARCHAR, num BIGINT);

INSERT INTO t VALUES
    ('A', 1),
    ('A', 2),
    ('A', 3),
    ('A', NULL),
    ('B', 4);

SELECT gr, num,
    first_value(num) OVER (PARTITION BY gr) AS first_value
    FROM t ORDER BY gr, num;

+----+------+-------------+
| gr | num  | first_value |
+----+------+-------------+
| A  | 1    | 1           |
+----+------+-------------+
| A  | 2    | 1           |
+----+------+-------------+
| A  | 3    | 1           |
+----+------+-------------+
| A  | null | 1           |
+----+------+-------------+
| B  | 4    | 4           |
+----+------+-------------+

`lag()`

Description	Returns the value calculated for the string shifted by `offset` rows from the current to the beginning of the group.
Usage	`lag(expression[, offset[, default]][ ORDER BY column][ IGNORE NULLS])`.

If no such row exists, the value default is returned (it must be of a compatible type).

Both arguments, offset and default, are optional. If they are not specified, the following default values are used:

offset: 1
default: NULL

If the ORDER BY condition is specified, the value is calculated using the specified order.

Parameters

IGNORE NULLS — the value is calculated excluding rows with NULL value.

See examples

CREATE TABLE t(gr VARCHAR, num BIGINT);

INSERT INTO t VALUES
    ('A', 1),
    ('A', 2),
    ('A', 3),
    ('B', 4),
    ('B', 5);

SELECT gr, num,
    lag(num) OVER (PARTITION BY gr) AS lag
    FROM t ORDER BY gr, num;

+----+-----+------+
| gr | num | lag  |
+----+-----+------+
| A  | 1   | null |
+----+-----+------+
| A  | 2   | 1    |
+----+-----+------+
| A  | 3   | 2    |
+----+-----+------+
| B  | 4   | null |
+----+-----+------+
| B  | 5   | 4    |
+----+-----+------+

CREATE TABLE t(gr VARCHAR, num BIGINT);

INSERT INTO t VALUES
    ('A', 1),
    ('A', 2),
    ('A', 3),
    ('B', 4),
    ('B', 5);

SELECT gr, num,
    lag(num, 1, -1) OVER (PARTITION BY gr) AS lag
    FROM t ORDER BY gr, num;

+----+-----+-----+
| gr | num | lag |
+----+-----+-----+
| A  | 1   | -1  |
+----+-----+-----+
| A  | 2   | 1   |
+----+-----+-----+
| A  | 3   | 2   |
+----+-----+-----+
| B  | 4   | -1  |
+----+-----+-----+
| B  | 5   | 4   |
+----+-----+-----+

`last_value()`

Description	Returns the value calculated by the specified expression for the last row of the group.
Usage	`last_value(expression[ ORDER BY column][ IGNORE NULLS])`

If the ORDER BY condition is specified, the value is calculated using the specified order.

Parameters

IGNORE NULLS — the value is calculated excluding rows with NULL value.

See example

CREATE TABLE t(gr VARCHAR, num BIGINT);

INSERT INTO t VALUES
    ('A', 1),
    ('A', 2),
    ('A', 3),
    ('A', NULL),
    ('B', 4),
    ('B', 5);

SELECT gr, num,
    last_value(num IGNORE NULLS) OVER (PARTITION BY gr) AS last_value
    FROM t ORDER BY gr, num;

+----+------+------------+
| gr | num  | last_value |
+----+------+------------+
| A  | 1    | 3          |
+----+------+------------+
| A  | 2    | 3          |
+----+------+------------+
| A  | 3    | 3          |
+----+------+------------+
| A  | null | 3          |
+----+------+------------+
| B  | 4    | 5          |
+----+------+------------+
| B  | 5    | 5          |
+----+------+------------+

`lead()`

Description	Returns the value calculated for the row shifted by `offset` rows from the current row to the end of the group.
Usage	`lead(expression[, offset[, default]][ ORDER BY column][ IGNORE NULLS])`.

If no such row exists, the value of default is returned (it must be type-compatible).

Both arguments, offset and default, are optional. If they are not specified, the following default values are used:

offset: 1
default: NULL

If the ORDER BY condition is specified, the value is calculated using the specified order.

Parameters

IGNORE NULLS — the value is calculated excluding rows with NULL value.

See examples

CREATE TABLE t(gr VARCHAR, num BIGINT);

INSERT INTO t VALUES
    ('A', 1),
    ('A', 2),
    ('A', 3),
    ('B', 4),
    ('B', 5);

SELECT gr, num,
    lead(num) OVER (PARTITION BY gr) AS lead
    FROM t ORDER BY gr, num;

+----+-----+------+
| gr | num | lead |
+----+-----+------+
| A  | 1   | 2    |
+----+-----+------+
| A  | 2   | 3    |
+----+-----+------+
| A  | 3   | null |
+----+-----+------+
| B  | 4   | 5    |
+----+-----+------+
| B  | 5   | null |
+----+-----+------+

CREATE TABLE t(gr VARCHAR, num BIGINT);

INSERT INTO t VALUES
    ('A', 1),
    ('A', 2),
    ('A', 3),
    ('B', 4),
    ('B', 5);

SELECT gr, num,
    lead(num, 1, -1) OVER (PARTITION BY gr) AS lead
    FROM t ORDER BY gr, num;

+----+-----+------+
| gr | num | lead |
+----+-----+------+
| A  | 1   | 2    |
+----+-----+------+
| A  | 2   | 3    |
+----+-----+------+
| A  | 3   | -1   |
+----+-----+------+
| B  | 4   | 5    |
+----+-----+------+
| B  | 5   | -1   |
+----+-----+------+

`nth_value()`

Description	Returns the value calculated for the nth row within a group (counting from `1`).
Usage	`nth_value(expression, n[ ORDER BY column][ IGNORE NULLS])`.

If there is no such row, the value NULL is returned.

If the ORDER BY condition is specified, the value is calculated using the specified order.

Parameters

IGNORE NULLS — the value is calculated excluding rows with NULL value.

See examples

CREATE TABLE t(gr VARCHAR, num BIGINT);

INSERT INTO t VALUES
    ('A', 1),
    ('A', 2),
    ('A', 3),
    ('B', 4),
    ('B', 5);

SELECT gr, num,
    nth_value(num, 3) OVER (PARTITION BY gr) AS third_value
    FROM t ORDER BY gr, num;

+----+-----+-------------+
| gr | num | third_value |
+----+-----+-------------+
| A  | 1   | 3           |
+----+-----+-------------+
| A  | 2   | 3           |
+----+-----+-------------+
| A  | 3   | 3           |
+----+-----+-------------+
| B  | 4   | null        |
+----+-----+-------------+
| B  | 5   | null        |
+----+-----+-------------+

`ntile()`

Description	Splits each group into `num` subgroups of equal (as large as possible) size and returns the subgroup number.
Usage	`ntile(num[ ORDER BY ordering])`

The partitioning is done so that the sizes of the subgroups are as close as possible.

If the ORDER BY condition is specified, the value is calculated using the specified order.

See example

CREATE TABLE t(gr VARCHAR, num BIGINT);

INSERT INTO t VALUES
    ('A', 1),
    ('A', 2),
    ('A', 3),
    ('B', 4),
    ('B', 5);

SELECT gr, num,
    ntile(2) OVER (PARTITION BY gr) AS group_number
    FROM t ORDER BY gr, num;

+----+-----+--------------+
| gr | num | group_number |
+----+-----+--------------+
| A  | 1   | 1            |
+----+-----+--------------+
| A  | 2   | 1            |
+----+-----+--------------+
| A  | 3   | 2            |
+----+-----+--------------+
| B  | 4   | 1            |
+----+-----+--------------+
| B  | 5   | 2            |
+----+-----+--------------+

`percent_rank()`

Description	Calculates the relative rank of the current row within a group.
Usage	`percent_rank([ORDER BY column])`

Returns a value of type DOUBLE from 0 to 1 inclusive. The relative rank is calculated using the formula (rank - 1) / (total number of rows in the group - 1).

If the ORDER BY condition is specified, the value is calculated using the specified order.

See example

CREATE TABLE t(gr VARCHAR, num BIGINT);

INSERT INTO t VALUES
    ('A', 1),
    ('A', 2),
    ('A', 3),
    ('B', 4),
    ('B', 5);

SELECT gr, num,
    percent_rank(ORDER BY num) OVER (PARTITION BY gr) AS percent_rank,
    FROM t ORDER BY gr, num;

+----+-----+--------------+
| gr | num | percent_rank |
+----+-----+--------------+
| A  | 1   | 0            |
+----+-----+--------------+
| A  | 2   | 0.5          |
+----+-----+--------------+
| A  | 3   | 1            |
+----+-----+--------------+
| B  | 4   | 0            |
+----+-----+--------------+
| B  | 5   | 1            |
+----+-----+--------------+

`rank()`

Description	Returns the rank (with skips) of the current row within a group.
Usage	`rank([ORDER BY column])`

If the ORDER BY condition is specified, the value is calculated using the specified order.

See example

CREATE TABLE t(gr VARCHAR, num BIGINT);

INSERT INTO t VALUES
    ('A', 1),
    ('A', 2),
    ('A', 2),
    ('A', 3),
    ('B', 4),
    ('B', 5);

SELECT gr, num,
    rank(ORDER BY num) OVER (PARTITION BY gr) AS rank,
    FROM t ORDER BY gr, num;

+----+-----+------+
| gr | num | rank |
+----+-----+------+
| A  | 1   | 1    |
+----+-----+------+
| A  | 2   | 2    |
+----+-----+------+
| A  | 2   | 2    |
+----+-----+------+
| A  | 3   | 4    |
+----+-----+------+
| B  | 4   | 1    |
+----+-----+------+
| B  | 5   | 2    |
+----+-----+------+

`row_number()`

Description	Returns the number of the current row in its group (counting from `1`).
Usage	`row_number([ORDER BY column])`

If the ORDER BY condition is specified, the value is calculated using the specified order.

See example

CREATE TABLE t(gr VARCHAR, num BIGINT);

INSERT INTO t VALUES
    ('A', 1),
    ('A', 2),
    ('A', 2),
    ('A', 3),
    ('B', 4),
    ('B', 5);

SELECT gr, num,
    row_number() OVER (PARTITION BY gr) AS row_number,
    FROM t ORDER BY gr, num;

+----+-----+------------+
| gr | num | row_number |
+----+-----+------------+
| A  | 1   | 1          |
+----+-----+------------+
| A  | 2   | 2          |
+----+-----+------------+
| A  | 2   | 3          |
+----+-----+------------+
| A  | 3   | 4          |
+----+-----+------------+
| B  | 4   | 1          |
+----+-----+------------+
| B  | 5   | 2          |
+----+-----+------------+