Data loading

You can upload data to Tengri in the following ways:

  • Using the upload wizard — button Upload file.

  • Via the Python tngri module.

Uploading data from a local file using the upload wizard

Supported file extensions for uploading:

  • .csv

  • .json

  • .parquet

To load data from a file:

  1. Click Upload file in the Tengri interface .

  2. Select or move the file to the opened upload area.

  3. In the Parse file window that opens, check if the parsed data is correct.
    If necessary, select the required recognition settings and press Next.

  4. You will then see a box with the code for downloading the file to be inserted into the notebook.
    For the file file_name.csv and for the user user_name it will look like this:

    select *
    from read_csv(
        "user_name/<id>_file_name.csv"
    )

    Press Add cell to add this cell to your notebook.

Once the cell is added to the notebook, the data from the file will be available for work.

Loading data from a file via the Python tngri module

To load data from non-local files, you can use the Python tngri module.

Loading data from a file via URL

Sample code at Python to load data from the iris.csv file located at the specified URL into Tengri:

import polars (1)
import tngri

df = polars.read_csv(
    "https://raw.githubusercontent.com/mwaskom/seaborn-data/master/iris.csv" (2)
)

tngri.upload_df(df) (3)
1 Import the required modules Python
2 Read the file at the specified URL and write it to the df variable using the polars module
3 Upload data from df to Tengri using the function tngri.upload_df.

In addition to .csv files, this method can be used for other extensions — .json, .xlsx and others (see Polars data upload documentation for details).

After that, a message like this will appear in the output cell:

UploadedFile(s3_path='s3://<path>/<file_id>.parquet')

Now the data from the file is available for work. To work with them, you need to use the read_parquet function and specify the id of the loaded .parquet file from the previous step:

SELECT * FROM read_parquet('<file_id>.parquet');
The .parquet extension will be the extension of the file loaded in this way in any case, regardless of the extension of the initial file.

Loading data from a file saved in S3

Example code at Python to load into Tengri data from the my_file.parquet file located in your bucket S3:

import tngri (1)

tngri.upload_s3(
    object="s3://my_folders/my_file.parquet", (2)
    access_key="***",
    secret_key="***"
)
1 Import module Python tngri.
2 Set the parameters of your bucket S3 (file path and access keys)

The function tngri.upload_s3 uploads the file from your S3 bucket to Tengri.

The data from the file will then be available for work.

The file extension of the file can be anything. It will remain the same as it was in the initial file. To work with different extensions you should use different functions.

To work with data from a file in our example, use the read_parquet function:

SELECT * FROM read_parquet('my_file.parquet');

If necessary, you can specify a path and name for the uploaded file inside Tengri via the optional filename parameter of the tngri.upload_s3 function:
filename="new_path/new_name.parquet".

Working with data from downloaded files

  • Check that the loaded .csv file is available:

    SELECT * FROM read_csv('customer_country.csv');
  • Create a table with data from the loaded .csv file:

    CREATE OR REPLACE TABLE customer_country AS
        SELECT * FROM read_csv('customer_country.csv');
  • Load data from the .csv file into an existing table:

    INSERT INTO customer_country
        SELECT * FROM read_csv('customer_country.csv');

Working with data from downloaded files of different extensions

To work with loaded files of other extensions, you need to use the corresponding functions.

  • Create a table with data from loaded file .parquet:

    CREATE OR REPLACE TABLE customer_country AS
        SELECT * FROM read_parquet('customer_country.parquet');
  • Create a table with data from the loaded .json file:

    CREATE OR REPLACE TABLE customer_country AS
        SELECT * FROM read_json('customer_country.json');
  • Create a table with data from the loaded .xlsx file:

    CREATE OR REPLACE TABLE customer_country AS
        SELECT * FROM read_xlsx('customer_country.json');