Python module tngri
functions
The Python module tngri
was created for convenient work with data in cells of type Python. The functions described on this page are available in it.
The |
tngri.sql()
Description |
Executes the specified query SQL inside a cell of type Python. |
Usage |
|
This function is useful when you need to execute any SQL queries directly inside a cell of type Python, such as inside a loop or other complex constructs, without creating a separate cell of type SQL and using any local variables and functions Python inside the query text SQL.
One usage scenario is described here.
See examples
Example 1
Let’s create a table with a name from the table_name
variable and write to it in a loop:
-
index (starting with
1
) -
word from the phrase specified in the
test_phrase
variable -
the result of applying the given function
length_in_chars
to this word
In each iteration of the loop we will output the index value, the added word and the result of the query SQL with the current number of rows in the table being created.
import tngri
def length_in_chars(text):
if len(text) == 1:
return '1 character'
else:
return f'{len(text)} characters'
table_name = 'my_table'
test_phrase = 'I love Tengri'
tngri.sql(f'CREATE OR REPLACE TABLE {table_name} \
(index INT, word VARCHAR, length VARCHAR)'
)
ind = 0
for word in test_phrase.split(' '):
ind += 1
tngri.sql(f"INSERT INTO {table_name} VALUES \
({ind}, '{word}', '{length_in_chars(word)}')"
)
print(f'Added word: "{word}"')
print(tngri.sql(f'SELECT count(*) FROM {table_name}'))
Step: 1
Added word: "I"
shape: (1, 1)
+----------+
│ column_0 │
│ --- │
│ i64 │
+----------+
│ 1 │
+----------+
Step: 2
Added word: "love"
shape: (1, 1)
+----------+
│ column_0 │
│ --- │
│ i64 │
+----------+
│ 2 │
+----------+
Step: 3
Added word: "Tengri"
shape: (1, 1)
+----------+
│ column_0 │
│ --- │
│ i64 │
+----------+
│ 3 │
+----------+
Now in a cell of type SQL we will display the created table ordered by index:
SELECT * FROM my_table
ORDER BY index
+-------+--------+--------------+
| index | word | length |
+-------+--------+--------------+
| 1 | I | 1 character |
+-------+--------+--------------+
| 2 | love | 4 characters |
+-------+--------+--------------+
| 3 | Tengri | 6 characters |
+-------+--------+--------------+
Example 2
Let’s iteratively load data from .parquet
files from S3 storage into a table by file path mask.
In each iteration of the loop we will output the result of the query with the number of rows in the created table.
import tngri
for i in range(1,4):
file_name = f"s3://prostore/Stage/<lake_path>/{i}.parquet"
tngri.sql(f"INSERT INTO raw_dyntest SELECT * FROM read_parquet('{file_name}')")
print(tngri.sql("SELECT count(*) FROM raw_dyntest"))
shape: (1, 1)
+----------+
│ column_0 │
│ --- │
│ i64 │
+----------+
│ 10000000 │
+----------+
shape: (1, 1)
+----------+
│ column_0 │
│ --- │
│ i64 │
+----------+
│ 20000000 │
+----------+
shape: (1, 1)
+----------+
│ column_0 │
│ --- │
│ i64 │
+----------+
│ 30000000 │
+----------+
tngri.upload_df()
Description |
Uploads data from DataFrame to Tengri. |
Usage |
|
Uploads data from the specified DataFrame to Tengri (in the S3) storage.
Returns a string containing the name of the .parquet
file to which the data was uploaded.
Loading data into Tengri using Python is described in more detail at here.
See examples
-
Let’s create a DataFrame, and load it into Tengri:
import tngri import pandas my_df = pandas.DataFrame(range(100)) tngri.upload_df(my_df)
UploadedFile(s3_path='s3://prostore/Stage/ijwsajclddxw.parquet', _client=None)
-
Let’s create a DataFrame, load it into Tengri and write the file name
.parquet
to thefile_name
variable:import tngri import pandas my_df = pandas.DataFrame(range(100)) file_name = tngri.upload_df(my_df) print(my_df) print(file_name)
0 0 0 1 1 2 2 3 3 4 4 .. .. 95 95 96 96 97 97 98 98 99 99 [100 rows x 1 columns] tdiiuetablyx.parquet
tngri.upload_file()
Description |
Uploads data from a file to Tengri. |
Usage |
|
-
FilePath
— path to the file to upload to Tengri -
FileName
— the name for the file in the Tengri
Loads the data from the file at the specified path into Tengri (in the S3) repository.
Returns a string containing the path and filename of the file within the S3 storage to which the data was loaded.
One of the usage scenarios is described at here.
See examples
-
Let’s load data into Tengri from the
.json
file available at URL:import tngri import urllib.request urllib.request.urlretrieve( 'https://tngri.postgrespro.ru/documentation/ru/stable/_attachments/tengri_data_types.json', 'my_file.json' ) tngri.upload_file('my_file.json')
UploadedFile(s3_path='s3://prostore/Stage/pxfihzbonctd.json', _client=None)
Let’s output the first 5 rows of the table by reading it from the loaded file:
SELECT * FROM read_json("pxfihzbonctd.json") LIMIT 5
+----------+-----------+----------+---------------------------+ | name | type | category | description | +----------+-----------+----------+---------------------------+ | BIGINT | data type | numeric | Целые числа. | +----------+-----------+----------+---------------------------+ | BIGINT[] | data type | array | Массивы целых чисел. | +----------+-----------+----------+---------------------------+ | BLOB | data type | blob | Двоичные объекты. | +----------+-----------+----------+---------------------------+ | BOOL | data type | boolean | Булевы значения. | +----------+-----------+----------+---------------------------+ | BOOL[] | data type | array | Массивы булевых значений. | +----------+-----------+----------+---------------------------+
-
Let’s load data into Tengri from the
.json
file available at URL, and save the name of the loaded file to a variable:import tngri import urllib.request urllib.request.urlretrieve( 'https://tngri.postgrespro.ru/documentation/ru/stable/_attachments/tengri_data_types.json', 'my_file.json' ) file_name = tngri.upload_file('my_file.json') print(file_name)
pxfihzbonctd.json
Let’s output the first 5 rows of the table by reading it from the loaded file:
SELECT * FROM read_json("pxfihzbonctd.json") LIMIT 5
+----------+-----------+----------+---------------------------+ | name | type | category | description | +----------+-----------+----------+---------------------------+ | BIGINT | data type | numeric | Целые числа. | +----------+-----------+----------+---------------------------+ | BIGINT[] | data type | array | Массивы целых чисел. | +----------+-----------+----------+---------------------------+ | BLOB | data type | blob | Двоичные объекты. | +----------+-----------+----------+---------------------------+ | BOOL | data type | boolean | Булевы значения. | +----------+-----------+----------+---------------------------+ | BOOL[] | data type | array | Массивы булевых значений. | +----------+-----------+----------+---------------------------+
tngri.upload_s3()
Description |
Uploads a file from the specified bucket S3 to Tengri. |
Usage |
|
-
object
— path to file in S3 storage . -
access_key
— access key for S3. -
secret_key
— secret key for S3.
The file extension can be anything. It will remain the same as it was in the initial file.
If necessary, the path and name of the uploaded file can be specified via the optional filename
parameter.
An example of use is described at here.