The data pipeline functionality enables you to load data in the data format. Use these examples to see how you can access the structure in different ways. For an overview of loading Parquet data, see Data Pipeline Load of Parquet Data from S3.Documentation Index
Fetch the complete documentation index at: https://docs.ocient.com/llms.txt
Use this file to discover all available pages before exploring further.
Work with Parquet Selectors
The SELECT statement for a Parquet file uses the same syntax as the existing JSON selectors. This selection applies to lists, structures, and maps. The System retrieves its list of requested columns from theSELECT SQL statement. The columns correspond to the field names of the schema for the Parquet files for the load. The load retrieves required columns from the Parquet source files and leaves the other columns alone.
This example snippet shows a list of ["id", "name", "array[]", "a[].project", "my_map.key_name"] as the fields to select from the data. The statement only retrieves the columns for ["id", "name", "array", "a", "my_map"] in the row groups.
SQL
You must select a leaf element, an array, or a tuple with your selector. This is stricter than using JSON selectors, which can directly select array fields and JSON object fields.Example: The selector must be
{"a": [1,2,3], "b": {"c": 1}}You can extract with any of the selectors in JSON: $a, $a[], $b, $b.cHowever, Parquet only allows for the selectors: $a[], $b.cThis example assumes this schema:$my_list[], which includes the array syntax.Work with Complex Parquet Selectors
These examples show how to load a Parquet list and convert it to an array and how to load a map and convert it to a tuple.Load List as an Array Example
Create thetips table to store the tip inquiries a city receives. The columns are:
id— Identifier as an integercategories— Categories as a list of stringsmayor— Mayor as a stringmentions— Mentions as a list of stringsservice_name— Service name as a non-nullable stringtags— Tags as a list of stringsurl— URL as a stringusername— Username as a stringtext— Text description as a non-nullable stringvenue_name— Venue name as a non-nullable string
SQL
tips_pipeline data pipeline for loading the tip data. Specify the S3 source with bucket ocient-docs, object key metabase_samples/parquet/tips.parquet, and endpoint https://s3.us-east-1.amazonaws.com. The source format is Parquet.
SQL
categories, mentions, and tags columns as arrays.
SQL
Load Map as a Tuple Example
This example uses the file/tmp/to-load/locations.parquet that contains a MAP<STRING, DOUBLE> field named coordinates. Each value of the map field contains x and y entries for the coordinates.
Create the locations table to store the map values. The cartesian_coords column is a tuple with two elements for the x and y coordinates.
SQL
locations_pipeline data pipeline for loading the maps data. Specify the filter for the Parquet file /tmp/to-load/locations.parquet. The source format is Parquet. Select the x and y coordinates from the map.
SQL
cartesian_coords column as a tuple.
SQL

