Functions for regular expressions
Functions for working with text strings using regular expressions. Regular expressions use the syntax RE2.
regexp_extract()
Description |
Extracts a substring from a string using the given regular expression. |
Usage |
|
If the string string
contains a substring overlaid by the regular expression pattern regexp
, returns that substring. If there are several substrings covered by the pattern, it returns the first one. If no such substring is found, returns an empty string.
See examples
SELECT
regexp_extract('Tengri', '..') AS result_1,
regexp_extract('Tengri', 'n.*') AS result_2,
regexp_extract('Tengri', '.*') AS result_3,
regexp_extract('Tengri', '^.{5}') AS result_4,
regexp_extract('Tengri', '[TNGRi]$') AS result_5,
regexp_extract('Tengri', '.{7}') AS result_empty;
+----------+----------+----------+----------+----------+--------------+
| result_1 | result_2 | result_3 | result_4 | result_5 | result_empty |
+----------+----------+----------+----------+----------+--------------+
| Te | ngri | Tengri | Tengr | i | |
+----------+----------+----------+----------+----------+--------------+
regexp_extract_all()
Description |
Extracts all non-overlapping substrings from a string using the given regular expression. Returns an array of substrings. |
Usage |
|
If no substrings in string
covered by the regexp
pattern are found, returns an empty list.
Parameters
-
<num>
— the number of the group within the pattern to return (for each substring). By default (if no parameter is given), the entire substring is returned. Group numbering starts with1
. Groups in the template are highlighted with parentheses.More details about the parameter on examples
Let’s extract from the text all combinations of letters with a dot after them:
SELECT regexp_extract_all('My name is Tengri. My nickname is TNGRi.', '(\w+)(\.)') AS result;
+------------------+ | result | +------------------+ | {Tengri.,TNGRi.} | +------------------+
Now let’s extract from the same text using the same regular expression the combinations of letters before the dot, but without the dot itself. For this purpose, let’s set the group number
1
:SELECT regexp_extract_all('My name is Tengri. My nickname is TNGRi.', '(\w+)(\.)', 1) AS result;
+----------------+ | result | +----------------+ | {Tengri,TNGRi} | +----------------+
See more examples
SELECT
regexp_extract_all('My name is Tengri. My nickname is TNGRi.', '\w+')
AS words;
+------------------------------------------+
| words |
+------------------------------------------+
| {My,name,is,Tengri,My,nickname,is,TNGRi} |
+------------------------------------------+
We use a combination of the regexp_extract_all
and unnest
functions to extract data from structured text:
CREATE TABLE text_table(text_data VARCHAR);
INSERT INTO text_table VALUES
('Name: "Tengri", Nickname:"TNGRi"'),
('Country: "Russia", Capital:"Moscow"');
SELECT
unnest(
regexp_extract_all(text_data, '(\w+):\s*"(.*?)"', 1)
) AS key,
unnest(
regexp_extract_all(text_data, '(\w+):\s*"(.*?)"', 2)
) AS value
FROM text_table;
+----------+--------+
| key | value |
+----------+--------+
| Name | Tengri |
+----------+--------+
| Nickname | TNGRi |
+----------+--------+
| Country | Russia |
+----------+--------+
| Capital | Moscow |
+----------+--------+
regexp_full_match()
Description |
Checks whether a regular expression overlaps a string completely. |
Usage |
|
If the regular expression pattern regexp
completely overlaps the string string
, returns true
, otherwise — false
.
See examples
SELECT
regexp_full_match('Tengri', 'gri$') AS result_1, -- false expected
regexp_full_match('Tengri', '.') AS result_2, -- false expected
regexp_full_match('Tengri', '.*') AS result_3, -- true expected
regexp_full_match('Tengri', '^\w+gri$') AS result_4; -- true expected
+----------+----------+----------+----------+
| result_1 | result_2 | result_3 | result_4 |
+----------+----------+----------+----------+
| false | false | true | true |
+----------+----------+----------+----------+
regexp_matches()
Description |
Checks if a regular expression is contained within a string. |
Usage |
|
If at least one substring covered by the regexp
pattern is found inside the string string
, returns true
, otherwise — false
.
See examples
SELECT
regexp_matches('Tengri', '.+T') AS result_1, -- false expected
regexp_matches('Tengri', '.*T') AS result_2; -- true expected
+----------+----------+
| result_1 | result_2 |
+----------+----------+
| false | true |
+----------+----------+
regexp_replace()
Description |
Replaces a substring covered by a regular expression with the specified string. |
Usage |
|
If a substring covered by the regexp
pattern is found inside the string string
, it is replaced by the string target
. If several such substrings are found, only the first one is replaced. If no substring is found, the original string string
is returned.
See examples
SELECT
regexp_replace('Tengri', '.', 't') AS result_1,
regexp_replace('Tengri', '.*', 't') AS result_2,
regexp_replace('Tengri', 'e.*r', 'NGR') AS result_3,
regexp_replace('Tengri', 'a', 't') AS result_4;
+----------+----------+----------+----------+
| result_1 | result_2 | result_3 | result_4 |
+----------+----------+----------+----------+
| tengri | t | TNGRi | Tengri |
+----------+----------+----------+----------+
regexp_split_to_array()
Description |
Splits a string into parts, separated by a regular expression, and returns the parts as an array. |
Usage |
|
Aliases |
|
If substrings overlapped by the regexp
pattern are found in the string string
, then the parts of the original string not overlapped by the pattern are returned as an array. If the found substrings are at the beginning or at the end of the source string, the resulting array will contain empty strings for the beginning and the end of the string. If no substrings are found, an array of one source string will be returned.
See examples
SELECT
string_split_regex('My name is Tengri. My nickname is TNGRi.', '\.\s')
AS sentences,
string_split_regex('My name is Tengri. My nickname is TNGRi.', '[\.\s]+')
AS words;
+-------------------------------------------+-------------------------------------------+
| sentences | words |
+-------------------------------------------+-------------------------------------------+
| {My name is Tengri,My nickname is TNGRi.} | {My,name,is,Tengri,My,nickname,is,TNGRi,} |
+-------------------------------------------+-------------------------------------------+
regexp_split_to_table()
Description |
Splits a string into parts separated by a regular expression, and returns the parts as strings. |
Usage |
|
If substrings overlapped by the regexp
pattern are found in the string string
, then the parts of the original string not overlapped by the pattern are returned as a column, with each part written to a different cell. If the found substrings are at the beginning or end of the source string, the result will include empty strings for the beginning and end of the string. If no substrings are found, a column with one cell in which the original string is written will be returned.
See examples
SELECT
regexp_split_to_table('My name is Tengri. My nickname is TNGRi.', '\.\s')
AS sentences,
regexp_split_to_table('My name is Tengri. My nickname is TNGRi.', '[\.\s]+')
AS words;
+-----------------------+----------+
| sentences | words |
+-----------------------+----------+
| My name is Tengri | My |
+-----------------------+----------+
| My nickname is TNGRi. | name |
+-----------------------+----------+
| null | is |
+-----------------------+----------+
| null | Tengri |
+-----------------------+----------+
| null | My |
+-----------------------+----------+
| null | nickname |
+-----------------------+----------+
| null | is |
+-----------------------+----------+
| null | TNGRi |
+-----------------------+----------+
| null | |
+-----------------------+----------+
Operator ~
Description |
Checks if the regular expression covers the string completely. |
Usage |
|
Can be used with the negation operator !
: string !~ regexp
.
Fully synonymous with the SIMILAR TO
operator.
See examples
SELECT
'Tengri' !~ 'TNGRi' AS result_1,
'Tengri' ~ 'T.*' AS result_2,
'Tengri' ~ 'T.+i' AS result_3,
'TNGRi' ~ 'T.+i' AS result_4,
'T.+i' ~ 'T.+i' AS result_5;
+----------+----------+----------+----------+----------+
| result_1 | result_2 | result_3 | result_4 | result_5 |
+----------+----------+----------+----------+----------+
| true | true | true | true | true |
+----------+----------+----------+----------+----------+