Functions for regular expressions

regexp_extract()
regexp_extract_all()
regexp_full_match()
regexp_matches()
regexp_replace()
regexp_split_to_array()
regexp_split_to_table()
Operator ~

Functions for working with text strings using regular expressions. Regular expressions use the syntax RE2.

`regexp_extract()`

Description	Extracts a substring from a string using the given regular expression.
Usage	`regexp_extract(string, regexp[, group_number, options] \| [, name_list, options])`

If the string string has a substring overlaid with the regular expression pattern regexp, returns that substring. If there are several substrings covered by the pattern, it returns the first one. If no such substring is found, returns an empty string.

You can specify the number of the returned group and parameters or a list of group names and parameters (see examples).

Regular expression options

'c' — Case-sensitive matching
'i' — Case-insensitive matching
'l' — Match literals instead of regular expression tokens

See examples

SELECT
    regexp_extract('Tengri', '..')                              AS result_1,
    regexp_extract('Tengri', 'n.*')                             AS result_2,
    regexp_extract('Tengri', '.*')                              AS result_3,
    regexp_extract('Tengri', '^.{5}')                           AS result_4,
    regexp_extract('Tengri', '[TNGRi]$')                        AS result_5,
    regexp_extract('Tengri, TNGRi', 't.', 0, 'i')               AS result_6,
    regexp_extract('Tengri; TNGRi;', '(.*); (.*);', ['1', '2']) AS result_7,
    regexp_extract('Tengri', '.{7}')                            AS result_empty;

+----------+----------+----------+----------+----------+----------+-----------------------------------------+--------------+
| result_1 | result_2 | result_3 | result_4 | result_5 | result_6 | result_7                                | result_empty |
+----------+----------+----------+----------+----------+----------+-----------------------------------------+--------------+
| Te       | ngri     | Tengri   | Tengr    | i        | Te       | "{"1": "Tengri", "2": "TNGRi"}" |              |
+----------+----------+----------+----------+----------+----------+-----------------------------------------+--------------+

`regexp_extract_all()`

Description	Extracts all non-overlapping substrings from a string using the given regular expression. Returns an array of substrings.
Usage	`regexp_extract_all(string, regexp[, <num>, options])`

If no substrings in string covered by the regexp pattern are found, returns an empty list.

Parameters

<num> — the number of the group within the pattern to return (for each substring). By default (if no parameter is given), the entire substring is returned. Group numbering starts with 1. Groups in the template are highlighted with parentheses.
More details about the parameter on examples
Let’s extract from the text all combinations of letters with a dot after them:
SELECT regexp_extract_all('My name is Tengri. My nickname is TNGRi.', '(\w+)(\.)') AS result;
+------------------+ | result | +------------------+ | {Tengri.,TNGRi.} | +------------------+
Now let’s extract from the same text using the same regular expression the combinations of letters before the dot, but without the dot itself. For this purpose, let’s set the group number 1:
SELECT regexp_extract_all('My name is Tengri. My nickname is TNGRi.', '(\w+)(\.)', 1) AS result;
+----------------+ | result | +----------------+ | {Tengri,TNGRi} | +----------------+

Regular expression options

'c' — Case-sensitive matching
'i' — Case-insensitive matching
'l' — Match literals instead of regular expression tokens

See more examples

SELECT
    regexp_extract_all('My name is Tengri. My nickname is TNGRi.', '\w+')
AS words;

+------------------------------------------+
|                   words                  |
+------------------------------------------+
| {My,name,is,Tengri,My,nickname,is,TNGRi} |
+------------------------------------------+

We use a combination of the regexp_extract_all and unnest functions to extract data from structured text:

CREATE TABLE text_table(text_data VARCHAR);

INSERT INTO text_table VALUES
('Name: "Tengri", Nickname:"TNGRi"'),
('Country: "Russia", Capital:"Moscow"');

SELECT
    unnest(
        regexp_extract_all(text_data, '(\w+):\s*"(.*?)"', 1)
          ) AS key,
    unnest(
        regexp_extract_all(text_data, '(\w+):\s*"(.*?)"', 2)
          ) AS value
FROM text_table;

+----------+--------+
|    key   | value  |
+----------+--------+
| Name     | Tengri |
+----------+--------+
| Nickname | TNGRi  |
+----------+--------+
| Country  | Russia |
+----------+--------+
| Capital  | Moscow |
+----------+--------+

`regexp_full_match()`

Description	Checks whether a regular expression overlaps a string completely.
Usage	`regexp_full_match(string, regexp[, options])`

If the regular expression pattern regexp completely overlaps the string string, returns true, otherwise — false.

Regular expression options

'c' — Case-sensitive matching
'i' — Case-insensitive matching
'l' — Match literals instead of regular expression tokens

See examples

SELECT
    regexp_full_match('Tengri', 'gri$')     AS result_1, -- false expected
    regexp_full_match('Tengri', '.')        AS result_2, -- false expected
    regexp_full_match('Tengri', '.*')       AS result_3, -- true expected
    regexp_full_match('Tengri', '^\w+gri$') AS result_4, -- true expected
    regexp_full_match('Tengri', 't.*', 'i') AS result_5; -- true expected

+----------+----------+----------+----------+----------+
| result_1 | result_2 | result_3 | result_4 | result_5 |
+----------+----------+----------+----------+----------+
| false    | false    | true     | true     | true     |
+----------+----------+----------+----------+----------+

`regexp_matches()`

Description	Checks if a regular expression is contained within a string.
Usage	`regexp_matches(string, regexp[, options])`

If at least one substring covered by the regexp pattern is found inside the string string, returns true, otherwise — false.

Regular expression options

'c' — Case-sensitive matching
'i' — Case-insensitive matching
'l' — Match literals instead of regular expression tokens

See examples

SELECT
SELECT
    regexp_matches('Tengri', '.+T')     AS result_1, -- false expected
    regexp_matches('Tengri', '.*T')     AS result_2, -- true expected
    regexp_matches('Tengri', 'T')       AS result_3, -- true expected
    regexp_matches('Tengri', 't', 'i')  AS result_4, -- true expected
    regexp_matches('T*ngri', 'T*', 'l') AS result_5; -- true expected

+----------+----------+----------+----------+----------+
| result_1 | result_2 | result_3 | result_4 | result_5 |
+----------+----------+----------+----------+----------+
| false    | true     | true     | true     | true     |
+----------+----------+----------+----------+----------+

`regexp_replace()`

Description	Replaces a substring covered by a regular expression with the specified string.
Usage	`regexp_replace(string, regexp, target[, options])`

If a substring covered by the regexp pattern is found inside the string string, it is replaced by the string target. If several such substrings are found, only the first one is replaced. If no substring is found, the original string string is returned.

Regular expression options

'c' — Case-sensitive matching
'i' — Case-insensitive matching
'l' — Match literals instead of regular expression tokens

See examples

SELECT
    regexp_replace('Tengri', '.', 't')              AS result_1,
    regexp_replace('Tengri', '.*', 't')             AS result_2,
    regexp_replace('Tengri', 'e.*r', 'NGR')         AS result_3,
    regexp_replace('Tengri', 'a', 't')              AS result_4,
    regexp_replace('Tengri', 'tengr', 'TNGR', 'i')  AS result_5;

+----------+----------+----------+----------+----------+
| result_1 | result_2 | result_3 | result_4 | result_5 |
+----------+----------+----------+----------+----------+
| tengri   | t        | TNGRi    | Tengri   | TNGRi    |
+----------+----------+----------+----------+----------+

`regexp_split_to_array()`

Description	Splits a string into parts, separated by a regular expression, and returns the parts as an array.
Usage	`regexp_split_to_array(string, regexp[, options])`
Aliases	`string_split_regex()`

If substrings overlapped by the regexp pattern are found in the string string, then the parts of the original string not overlapped by the pattern are returned as an array. If the found substrings are at the beginning or at the end of the source string, the resulting array will contain empty strings for the beginning and the end of the string. If no substrings are found, an array of one source string will be returned.

Regular expression options

'c' — Case-sensitive matching
'i' — Case-insensitive matching
'l' — Match literals instead of regular expression tokens

See examples

SELECT
    string_split_regex('My name is Tengri. My nickname is TNGRi.', '\.\s')
AS sentences,
    string_split_regex('My name is Tengri. My nickname is TNGRi.', '[\.\s]+')
AS words;

+-------------------------------------------+-------------------------------------------+
|                 sentences                 |                   words                   |
+-------------------------------------------+-------------------------------------------+
| {My name is Tengri,My nickname is TNGRi.} | {My,name,is,Tengri,My,nickname,is,TNGRi,} |
+-------------------------------------------+-------------------------------------------+

`regexp_split_to_table()`

Description	Splits a string into parts separated by a regular expression, and returns the parts as strings.
Usage	`regexp_split_to_table(string, regexp[, options])`

If substrings overlapped by the regexp pattern are found in the string string, then the parts of the original string not overlapped by the pattern are returned as a column, with each part written to a different cell. If the found substrings are at the beginning or end of the source string, the result will include empty strings for the beginning and end of the string. If no substrings are found, a column with one cell in which the original string is written will be returned.

Regular expression options

'c' — Case-sensitive matching
'i' — Case-insensitive matching
'l' — Match literals instead of regular expression tokens

See examples

SELECT
    regexp_split_to_table('My name is Tengri. My nickname is TNGRi.', '\.\s')
AS sentences,
    regexp_split_to_table('My name is Tengri. My nickname is TNGRi.', '[\.\s]+')
AS words;

+-----------------------+----------+
|       sentences       |   words  |
+-----------------------+----------+
| My name is Tengri     | My       |
+-----------------------+----------+
| My nickname is TNGRi. | name     |
+-----------------------+----------+
| null                  | is       |
+-----------------------+----------+
| null                  | Tengri   |
+-----------------------+----------+
| null                  | My       |
+-----------------------+----------+
| null                  | nickname |
+-----------------------+----------+
| null                  | is       |
+-----------------------+----------+
| null                  | TNGRi    |
+-----------------------+----------+
| null                  |          |
+-----------------------+----------+

Operator `~`

Description	Checks if the regular expression covers the string completely.
Usage	`string ~ regexp`.

Can be used with the negation operator !: string !~ regexp.

Fully synonymous with the SIMILAR TO operator.

See examples

SELECT
    'Tengri' !~ 'TNGRi' AS result_1,
    'Tengri' ~ 'T.*'    AS result_2,
    'Tengri' ~ 'T.+i'   AS result_3,
    'TNGRi'  ~ 'T.+i'   AS result_4,
    'T.+i'   ~ 'T.+i'   AS result_5;

+----------+----------+----------+----------+----------+
| result_1 | result_2 | result_3 | result_4 | result_5 |
+----------+----------+----------+----------+----------+
| true     | true     | true     | true     | true     |
+----------+----------+----------+----------+----------+

Functions for regular expressions

regexp_extract()

regexp_extract_all()

Parameters

regexp_full_match()

regexp_matches()

regexp_replace()

regexp_split_to_array()

regexp_split_to_table()

Operator ~

`regexp_extract()`

`regexp_extract_all()`

`regexp_full_match()`

`regexp_matches()`

`regexp_replace()`

`regexp_split_to_array()`

`regexp_split_to_table()`

Operator `~`