Data loading
You can upload data to Tengri in the following ways:
-
Using the upload wizard — button Upload file.
-
Via the Python
tngri
module.
Uploading data from a local file using the upload wizard
Supported file extensions for uploading:
-
.csv
-
.json
-
.parquet
To load data from a file:
-
Click Upload file in the Tengri interface .
-
Select or move the file to the opened upload area.
-
In the Parse file window that opens, check if the parsed data is correct.
If necessary, select the required recognition settings and press Next. -
You will then see a box with the code for downloading the file to be inserted into the notebook.
For the filefile_name.csv
and for the useruser_name
it will look like this:select * from read_csv( "user_name/<id>_file_name.csv" )
Press Add cell to add this cell to your notebook.
Once the cell is added to the notebook, the data from the file will be available for work.
Loading data from a file via the Python tngri
module
To load data from non-local files, you can use the Python tngri
module.
Loading data from a file via URL
Sample code at Python to load data from the iris.csv
file located at the specified URL into Tengri:
import polars (1)
import tngri
df = polars.read_csv(
"https://raw.githubusercontent.com/mwaskom/seaborn-data/master/iris.csv" (2)
)
tngri.upload_df(df) (3)
1 | Import the required modules Python |
2 | Read the file at the specified URL and write it to the df variable using the polars module |
3 | Upload data from df to Tengri using the function tngri.upload_df . |
In addition to .csv
files, this method can be used for other extensions — .json
, .xlsx
and others (see Polars data upload documentation for details).
After that, a message like this will appear in the output cell:
UploadedFile(s3_path='s3://<path>/<file_id>.parquet')
Now the data from the file is available for work. To work with them, you need to use the read_parquet
function and specify the id of the loaded .parquet
file from the previous step:
SELECT * FROM read_parquet('<file_id>.parquet');
The .parquet extension will be the extension of the file loaded in this way in any case, regardless of the extension of the initial file.
|
Loading data from a file saved in S3
Example code at Python to load into Tengri data from the my_file.parquet
file located in your bucket S3:
import tngri (1)
tngri.upload_s3(
object="s3://my_folders/my_file.parquet", (2)
access_key="***",
secret_key="***"
)
1 | Import module Python tngri . |
2 | Set the parameters of your bucket S3 (file path and access keys) |
The function tngri.upload_s3
uploads the file from your S3 bucket to Tengri.
The data from the file will then be available for work.
The file extension of the file can be anything. It will remain the same as it was in the initial file. To work with different extensions you should use different functions.
To work with data from a file in our example, use the read_parquet
function:
SELECT * FROM read_parquet('my_file.parquet');
If necessary, you can specify a path and name for the uploaded file inside Tengri via the optional filename
parameter of the tngri.upload_s3
function:
filename="new_path/new_name.parquet"
.
Working with data from downloaded files
-
Check that the loaded
.csv
file is available:SELECT * FROM read_csv('customer_country.csv');
-
Create a table with data from the loaded
.csv
file:CREATE OR REPLACE TABLE customer_country AS SELECT * FROM read_csv('customer_country.csv');
-
Load data from the
.csv
file into an existing table:INSERT INTO customer_country SELECT * FROM read_csv('customer_country.csv');
Working with data from downloaded files of different extensions
To work with loaded files of other extensions, you need to use the corresponding functions.
-
Create a table with data from loaded file
.parquet
:CREATE OR REPLACE TABLE customer_country AS SELECT * FROM read_parquet('customer_country.parquet');
-
Create a table with data from the loaded
.json
file:CREATE OR REPLACE TABLE customer_country AS SELECT * FROM read_json('customer_country.json');
-
Create a table with data from the loaded
.xlsx
file:CREATE OR REPLACE TABLE customer_country AS SELECT * FROM read_xlsx('customer_country.json');