Window functions

Window functions are functions that perform calculations for a set of rows that are related to the current row in some way. Unlike aggregate functions, values calculated by window functions can be inserted into each row of the source table.

A window function call always contains an OVER expression following the window function name and arguments. The OVER expression specifies exactly how the rows of the data set are to be divided for processing by the window function.

The OVER expression may be supplemented by the PARTITION BY expression, which divides the rows of the data set into groups, combining the rows into groups by matching the values of the specified columns. The value of the window function is calculated by the rows that fall into the same group as the current row.

cume_dist()

Description

Cumulative allocation.

Usage

cume_dist([ORDER BY column])

Cumulative distribution: the number of rows in the group preceding the current row or with the same value as the current row, divided by the total number of rows in the group.

If the ORDER BY condition is specified, the value is calculated using the specified order.

See example
CREATE TABLE t(gr VARCHAR, num BIGINT);

INSERT INTO t VALUES
    ('A', 1),
    ('A', 2),
    ('A', 2),
    ('A', 3),
    ('B', 4);

SELECT gr, num,
    cume_dist(ORDER BY num) OVER (PARTITION BY gr) AS cume_dist
    FROM t ORDER BY num;
+-------+--------+-----------+
| gr    | num    | cume_dist |
+-------+--------+-----------+
| A     | 1      | 0.25      |
+-------+--------+-----------+
| A     | 2      | 0.75      |
+-------+--------+-----------+
| A     | 2      | 0.75      |
+-------+--------+-----------+
| A     | 3      | 1         |
+-------+--------+-----------+
| B     | 4      | 1         |
+-------+--------+-----------+

dense_rank()

Description

The rank of the current string within the group without skips.

Usage

dense_rank()

Aliases

rank_dense()

Returns the rank of the current string within a group without skips. Essentially this function counts groups of rows with matching values on a given column and assigns them numbers within the group.

See example
CREATE TABLE t(gr VARCHAR, num BIGINT);

INSERT INTO t VALUES
    ('A', 1),
    ('A', 2),
    ('A', 2),
    ('A', 5),
    ('B', 4);

SELECT gr, num,
    dense_rank() OVER (PARTITION BY gr ORDER BY num) AS dense_rank
    FROM t ORDER BY gr, num;
+----+--------+------------+
| gr | num    | dense_rank |
+----+--------+------------+
| A  | 1      | 1          |
+----+--------+------------+
| A  | 2      | 2          |
+----+--------+------------+
| A  | 2      | 2          |
+----+--------+------------+
| A  | 5      | 3          |
+----+--------+------------+
| B  | 4      | 1          |
+----+--------+------------+

first_value()

Description

Returns the value calculated using the specified expression for the first row of the group.

Usage

first_value(expression[ ORDER BY column][ IGNORE NULLS]).

If the ORDER BY condition is specified, the value is calculated using the specified order.

Parameters

  • IGNORE NULLS — the value is calculated excluding rows with NULL value.

See example
CREATE TABLE t(gr VARCHAR, num BIGINT);

INSERT INTO t VALUES
    ('A', 1),
    ('A', 2),
    ('A', 3),
    ('A', NULL),
    ('B', 4);

SELECT gr, num,
    first_value(num) OVER (PARTITION BY gr) AS first_value
    FROM t ORDER BY gr, num;
+----+------+-------------+
| gr | num  | first_value |
+----+------+-------------+
| A  | 1    | 1           |
+----+------+-------------+
| A  | 2    | 1           |
+----+------+-------------+
| A  | 3    | 1           |
+----+------+-------------+
| A  | null | 1           |
+----+------+-------------+
| B  | 4    | 4           |
+----+------+-------------+

lag()

Description

Returns the value calculated for the string shifted by offset rows from the current to the beginning of the group.

Usage

lag(expression[, offset[, default]][ ORDER BY column][ IGNORE NULLS]).

If no such row exists, the value default is returned (it must be of a compatible type).

Both arguments, offset and default, are optional. If they are not specified, the following default values are used:

  • offset: 1

  • default: NULL

If the ORDER BY condition is specified, the value is calculated using the specified order.

Parameters

  • IGNORE NULLS — the value is calculated excluding rows with NULL value.

See examples
CREATE TABLE t(gr VARCHAR, num BIGINT);

INSERT INTO t VALUES
    ('A', 1),
    ('A', 2),
    ('A', 3),
    ('B', 4),
    ('B', 5);

SELECT gr, num,
    lag(num) OVER (PARTITION BY gr) AS lag
    FROM t ORDER BY gr, num;
+----+-----+------+
| gr | num | lag  |
+----+-----+------+
| A  | 1   | null |
+----+-----+------+
| A  | 2   | 1    |
+----+-----+------+
| A  | 3   | 2    |
+----+-----+------+
| B  | 4   | null |
+----+-----+------+
| B  | 5   | 4    |
+----+-----+------+
CREATE TABLE t(gr VARCHAR, num BIGINT);

INSERT INTO t VALUES
    ('A', 1),
    ('A', 2),
    ('A', 3),
    ('B', 4),
    ('B', 5);

SELECT gr, num,
    lag(num, 1, -1) OVER (PARTITION BY gr) AS lag
    FROM t ORDER BY gr, num;
+----+-----+-----+
| gr | num | lag |
+----+-----+-----+
| A  | 1   | -1  |
+----+-----+-----+
| A  | 2   | 1   |
+----+-----+-----+
| A  | 3   | 2   |
+----+-----+-----+
| B  | 4   | -1  |
+----+-----+-----+
| B  | 5   | 4   |
+----+-----+-----+

last_value()

Description

Returns the value calculated by the specified expression for the last row of the group.

Usage

last_value(expression[ ORDER BY column][ IGNORE NULLS])

If the ORDER BY condition is specified, the value is calculated using the specified order.

Parameters

  • IGNORE NULLS — the value is calculated excluding rows with NULL value.

See example
CREATE TABLE t(gr VARCHAR, num BIGINT);

INSERT INTO t VALUES
    ('A', 1),
    ('A', 2),
    ('A', 3),
    ('A', NULL),
    ('B', 4),
    ('B', 5);

SELECT gr, num,
    last_value(num IGNORE NULLS) OVER (PARTITION BY gr) AS last_value
    FROM t ORDER BY gr, num;
+----+------+------------+
| gr | num  | last_value |
+----+------+------------+
| A  | 1    | 3          |
+----+------+------------+
| A  | 2    | 3          |
+----+------+------------+
| A  | 3    | 3          |
+----+------+------------+
| A  | null | 3          |
+----+------+------------+
| B  | 4    | 5          |
+----+------+------------+
| B  | 5    | 5          |
+----+------+------------+

lead()

Description

Returns the value calculated for the row shifted by offset rows from the current row to the end of the group.

Usage

lead(expression[, offset[, default]][ ORDER BY column][ IGNORE NULLS]).

If no such row exists, the value of default is returned (it must be type-compatible).

Both arguments, offset and default, are optional. If they are not specified, the following default values are used:

  • offset: 1

  • default: NULL

If the ORDER BY condition is specified, the value is calculated using the specified order.

Parameters

  • IGNORE NULLS — the value is calculated excluding rows with NULL value.

See examples
CREATE TABLE t(gr VARCHAR, num BIGINT);

INSERT INTO t VALUES
    ('A', 1),
    ('A', 2),
    ('A', 3),
    ('B', 4),
    ('B', 5);

SELECT gr, num,
    lead(num) OVER (PARTITION BY gr) AS lead
    FROM t ORDER BY gr, num;
+----+-----+------+
| gr | num | lead |
+----+-----+------+
| A  | 1   | 2    |
+----+-----+------+
| A  | 2   | 3    |
+----+-----+------+
| A  | 3   | null |
+----+-----+------+
| B  | 4   | 5    |
+----+-----+------+
| B  | 5   | null |
+----+-----+------+
CREATE TABLE t(gr VARCHAR, num BIGINT);

INSERT INTO t VALUES
    ('A', 1),
    ('A', 2),
    ('A', 3),
    ('B', 4),
    ('B', 5);

SELECT gr, num,
    lead(num, 1, -1) OVER (PARTITION BY gr) AS lead
    FROM t ORDER BY gr, num;
+----+-----+------+
| gr | num | lead |
+----+-----+------+
| A  | 1   | 2    |
+----+-----+------+
| A  | 2   | 3    |
+----+-----+------+
| A  | 3   | -1   |
+----+-----+------+
| B  | 4   | 5    |
+----+-----+------+
| B  | 5   | -1   |
+----+-----+------+

nth_value()

Description

Returns the value calculated for the nth row within a group (counting from 1).

Usage

nth_value(expression, n[ ORDER BY column][ IGNORE NULLS]).

If there is no such row, the value NULL is returned.

If the ORDER BY condition is specified, the value is calculated using the specified order.

Parameters

  • IGNORE NULLS — the value is calculated excluding rows with NULL value.

See examples
CREATE TABLE t(gr VARCHAR, num BIGINT);

INSERT INTO t VALUES
    ('A', 1),
    ('A', 2),
    ('A', 3),
    ('B', 4),
    ('B', 5);

SELECT gr, num,
    nth_value(num, 3) OVER (PARTITION BY gr) AS third_value
    FROM t ORDER BY gr, num;
+----+-----+-------------+
| gr | num | third_value |
+----+-----+-------------+
| A  | 1   | 3           |
+----+-----+-------------+
| A  | 2   | 3           |
+----+-----+-------------+
| A  | 3   | 3           |
+----+-----+-------------+
| B  | 4   | null        |
+----+-----+-------------+
| B  | 5   | null        |
+----+-----+-------------+

ntile()

Description

Splits each group into num subgroups of equal (as large as possible) size and returns the subgroup number.

Usage

ntile(num[ ORDER BY ordering])

The partitioning is done so that the sizes of the subgroups are as close as possible.

If the ORDER BY condition is specified, the value is calculated using the specified order.

See example
CREATE TABLE t(gr VARCHAR, num BIGINT);

INSERT INTO t VALUES
    ('A', 1),
    ('A', 2),
    ('A', 3),
    ('B', 4),
    ('B', 5);

SELECT gr, num,
    ntile(2) OVER (PARTITION BY gr) AS group_number
    FROM t ORDER BY gr, num;
+----+-----+--------------+
| gr | num | group_number |
+----+-----+--------------+
| A  | 1   | 1            |
+----+-----+--------------+
| A  | 2   | 1            |
+----+-----+--------------+
| A  | 3   | 2            |
+----+-----+--------------+
| B  | 4   | 1            |
+----+-----+--------------+
| B  | 5   | 2            |
+----+-----+--------------+

percent_rank()

Description

Calculates the relative rank of the current row within a group.

Usage

percent_rank([ORDER BY column])

Returns a value of type DOUBLE from 0 to 1 inclusive. The relative rank is calculated using the formula (rank - 1) / (total number of rows in the group - 1).

If the ORDER BY condition is specified, the value is calculated using the specified order.

See example
CREATE TABLE t(gr VARCHAR, num BIGINT);

INSERT INTO t VALUES
    ('A', 1),
    ('A', 2),
    ('A', 3),
    ('B', 4),
    ('B', 5);

SELECT gr, num,
    percent_rank(ORDER BY num) OVER (PARTITION BY gr) AS percent_rank,
    FROM t ORDER BY gr, num;
+----+-----+--------------+
| gr | num | percent_rank |
+----+-----+--------------+
| A  | 1   | 0            |
+----+-----+--------------+
| A  | 2   | 0.5          |
+----+-----+--------------+
| A  | 3   | 1            |
+----+-----+--------------+
| B  | 4   | 0            |
+----+-----+--------------+
| B  | 5   | 1            |
+----+-----+--------------+

rank()

Description

Returns the rank (with skips) of the current row within a group.

Usage

rank([ORDER BY column])

If the ORDER BY condition is specified, the value is calculated using the specified order.

See example
CREATE TABLE t(gr VARCHAR, num BIGINT);

INSERT INTO t VALUES
    ('A', 1),
    ('A', 2),
    ('A', 2),
    ('A', 3),
    ('B', 4),
    ('B', 5);

SELECT gr, num,
    rank(ORDER BY num) OVER (PARTITION BY gr) AS rank,
    FROM t ORDER BY gr, num;
+----+-----+------+
| gr | num | rank |
+----+-----+------+
| A  | 1   | 1    |
+----+-----+------+
| A  | 2   | 2    |
+----+-----+------+
| A  | 2   | 2    |
+----+-----+------+
| A  | 3   | 4    |
+----+-----+------+
| B  | 4   | 1    |
+----+-----+------+
| B  | 5   | 2    |
+----+-----+------+

row_number()

Description

Returns the number of the current row in its group (counting from 1).

Usage

row_number([ORDER BY column])

If the ORDER BY condition is specified, the value is calculated using the specified order.

See example
CREATE TABLE t(gr VARCHAR, num BIGINT);

INSERT INTO t VALUES
    ('A', 1),
    ('A', 2),
    ('A', 2),
    ('A', 3),
    ('B', 4),
    ('B', 5);

SELECT gr, num,
    row_number() OVER (PARTITION BY gr) AS row_number,
    FROM t ORDER BY gr, num;
+----+-----+------------+
| gr | num | row_number |
+----+-----+------------+
| A  | 1   | 1          |
+----+-----+------------+
| A  | 2   | 2          |
+----+-----+------------+
| A  | 2   | 3          |
+----+-----+------------+
| A  | 3   | 4          |
+----+-----+------------+
| B  | 4   | 1          |
+----+-----+------------+
| B  | 5   | 2          |
+----+-----+------------+