Python module tngri functions

The Python module tngri was created for convenient work with data in cells of type Python. The functions described on this page are available in it.

The tngri module functions are available only in cells of Python type and for their operation it is necessary to import the module: import tngri.

tngri.sql()

Description

Executes the specified query SQL inside a cell of type Python.

Usage

tngri.sql('<SQL_query>')

This function is useful when you need to execute any SQL queries directly inside a cell of type Python, such as inside a loop or other complex constructs, without creating a separate cell of type SQL and using any local variables and functions Python inside the query text SQL.

One usage scenario is described here.

See examples

Example 1

Let’s create a table with a name from the table_name variable and write to it in a loop:

  • index (starting with 1)

  • word from the phrase specified in the test_phrase variable

  • the result of applying the given function length_in_chars to this word

In each iteration of the loop we will output the index value, the added word and the result of the query SQL with the current number of rows in the table being created.

import tngri

def length_in_chars(text):
    if len(text) == 1:
        return '1 character'
    else:
        return f'{len(text)} characters'

table_name = 'my_table'
test_phrase = 'I love Tengri'

tngri.sql(f'CREATE OR REPLACE TABLE {table_name} \
          (index INT, word VARCHAR, length VARCHAR)'
         )

ind = 0
for word in test_phrase.split(' '):
    ind += 1
    tngri.sql(f"INSERT INTO {table_name} VALUES \
              ({ind}, '{word}', '{length_in_chars(word)}')"
             )
    print(f'Added word: "{word}"')
    print(tngri.sql(f'SELECT count(*) FROM {table_name}'))
Step: 1
Added word: "I"
shape: (1, 1)
+----------+
│ column_0 │
│ ---      │
│ i64      │
+----------+
│ 1        │
+----------+
Step: 2
Added word: "love"
shape: (1, 1)
+----------+
│ column_0 │
│ ---      │
│ i64      │
+----------+
│ 2        │
+----------+
Step: 3
Added word: "Tengri"
shape: (1, 1)
+----------+
│ column_0 │
│ ---      │
│ i64      │
+----------+
│ 3        │
+----------+

Now in a cell of type SQL we will display the created table ordered by index:

SELECT * FROM my_table
    ORDER BY index
+-------+--------+--------------+
| index | word   | length       |
+-------+--------+--------------+
| 1     | I      | 1 character  |
+-------+--------+--------------+
| 2     | love   | 4 characters |
+-------+--------+--------------+
| 3     | Tengri | 6 characters |
+-------+--------+--------------+

Example 2

Let’s iteratively load data from .parquet files from S3 storage into a table by file path mask.

In each iteration of the loop we will output the result of the query with the number of rows in the created table.

import tngri

for i in range(1,4):
    file_name = f"s3://prostore/Stage/<lake_path>/{i}.parquet"
    tngri.sql(f"INSERT INTO raw_dyntest SELECT * FROM read_parquet('{file_name}')")
    print(tngri.sql("SELECT count(*) FROM raw_dyntest"))
shape: (1, 1)
+----------+
│ column_0 │
│ ---      │
│ i64      │
+----------+
│ 10000000 │
+----------+
shape: (1, 1)
+----------+
│ column_0 │
│ ---      │
│ i64      │
+----------+
│ 20000000 │
+----------+
shape: (1, 1)
+----------+
│ column_0 │
│ ---      │
│ i64      │
+----------+
│ 30000000 │
+----------+

tngri.upload_df()

Description

Uploads data from DataFrame to Tengri.

Usage

tngri.upload_df(DataFrame)

Uploads data from the specified DataFrame to Tengri (in the S3) storage.
Returns a string containing the name of the .parquet file to which the data was uploaded.

Loading data into Tengri using Python is described in more detail at here.

See examples
  • Let’s create a DataFrame, and load it into Tengri:

    import tngri
    import pandas
    
    my_df = pandas.DataFrame(range(100))
    
    tngri.upload_df(my_df)
    UploadedFile(s3_path='s3://prostore/Stage/ijwsajclddxw.parquet', _client=None)
  • Let’s create a DataFrame, load it into Tengri and write the file name .parquet to the file_name variable:

    import tngri
    import pandas
    
    my_df = pandas.DataFrame(range(100))
    file_name = tngri.upload_df(my_df)
    
    print(my_df)
    print(file_name)
    0
    0    0
    1    1
    2    2
    3    3
    4    4
    ..  ..
    95  95
    96  96
    97  97
    98  98
    99  99
    
    [100 rows x 1 columns]
    tdiiuetablyx.parquet

tngri.upload_file()

Description

Uploads data from a file to Tengri.

Usage

tngri.upload_file(FilePath[, FileName])

  • FilePath — path to the file to upload to Tengri

  • FileName — the name for the file in the Tengri

Loads the data from the file at the specified path into Tengri (in the S3) repository.
Returns a string containing the path and filename of the file within the S3 storage to which the data was loaded.

One of the usage scenarios is described at here.

See examples
  • Let’s load data into Tengri from the .json file available at URL:

    import tngri
    import urllib.request
    
    urllib.request.urlretrieve(
        'https://tngri.postgrespro.ru/documentation/ru/stable/_attachments/tengri_data_types.json',
        'my_file.json'
        )
    
    tngri.upload_file('my_file.json')
    UploadedFile(s3_path='s3://prostore/Stage/pxfihzbonctd.json', _client=None)

    Let’s output the first 5 rows of the table by reading it from the loaded file:

    SELECT * FROM read_json("pxfihzbonctd.json")
        LIMIT 5
    +----------+-----------+----------+---------------------------+
    | name     | type      | category | description               |
    +----------+-----------+----------+---------------------------+
    | BIGINT   | data type | numeric  | Целые числа.              |
    +----------+-----------+----------+---------------------------+
    | BIGINT[] | data type | array    | Массивы целых чисел.      |
    +----------+-----------+----------+---------------------------+
    | BLOB     | data type | blob     | Двоичные объекты.         |
    +----------+-----------+----------+---------------------------+
    | BOOL     | data type | boolean  | Булевы значения.          |
    +----------+-----------+----------+---------------------------+
    | BOOL[]   | data type | array    | Массивы булевых значений. |
    +----------+-----------+----------+---------------------------+
  • Let’s load data into Tengri from the .json file available at URL, and save the name of the loaded file to a variable:

    import tngri
    import urllib.request
    
    urllib.request.urlretrieve(
        'https://tngri.postgrespro.ru/documentation/ru/stable/_attachments/tengri_data_types.json',
        'my_file.json'
        )
    
    file_name = tngri.upload_file('my_file.json')
    print(file_name)
    pxfihzbonctd.json

    Let’s output the first 5 rows of the table by reading it from the loaded file:

    SELECT * FROM read_json("pxfihzbonctd.json")
        LIMIT 5
    +----------+-----------+----------+---------------------------+
    | name     | type      | category | description               |
    +----------+-----------+----------+---------------------------+
    | BIGINT   | data type | numeric  | Целые числа.              |
    +----------+-----------+----------+---------------------------+
    | BIGINT[] | data type | array    | Массивы целых чисел.      |
    +----------+-----------+----------+---------------------------+
    | BLOB     | data type | blob     | Двоичные объекты.         |
    +----------+-----------+----------+---------------------------+
    | BOOL     | data type | boolean  | Булевы значения.          |
    +----------+-----------+----------+---------------------------+
    | BOOL[]   | data type | array    | Массивы булевых значений. |
    +----------+-----------+----------+---------------------------+

tngri.upload_s3()

Description

Uploads a file from the specified bucket S3 to Tengri.

Usage

tngri.upload_s3(
    object = 's3://<file_path>.parquet',
    access_key = '<access_key>',
    secret_key = '<secret_key>',
    [filename = '<new_name>.parquet']
    )

  • object — path to file in S3 storage .

  • access_key — access key for S3.

  • secret_key — secret key for S3.

The file extension can be anything. It will remain the same as it was in the initial file. If necessary, the path and name of the uploaded file can be specified via the optional filename parameter.

An example of use is described at here.