Load Data
Data Formats for Data Pipeline...

JSON Selectors Examples in Data Pipelines

The data pipeline functionality enables you to load data in the structured JSON data format. Use these examples to see how you can access the structure in different ways. For an overview of loading JSON data, see:

Scalar Extraction from JSON

These examples load a scalar into the VARCHAR column named customer_name. Use the CREATE TABLE SQL statement to create the table with this column.

SQL


For a JSON file with the value at the top level, the selector can refer to the key directly.

Load a file with this data.

JSON


Use the selector $name.

SQL


List all top-level JSON keys used in the SELECT statement in JSON_FIELDS of the CREATE PIPELINE SQL statement.

For a JSON file that has a nested value, you need a complex selector.

JSON


In this case, the same CREATE PIPELINE statement for this file has $a.b.name instead of $name.

SQL


NULL and Empty Handling for JSON Scalars

The System handles all JSON NULL, empty, and missing values in the same way. The system loads these values as NULL. These values fail to load into non-nullable columns. Provide an explicit default in the pipeline or use the COLUMN_DEFAULT_IF_NULL option to accept the configured column default instead of attempting to load NULL values.

The JSON data contains this information.

JSON


In this case, the output table has two rows because there are two records in the JSON source file.

SQL


Output

SQL


When you load all these JSON files with the same pipeline definition, the Ocient System creates a table with the same two rows.

The middle value is empty.

JSON


The middle value is NULL.

JSON


The leaf value is NULL.

JSON


Array Extraction from JSON

Supported array extraction scenarios are:

  • One-dimensional arrays.
  • Multi-dimensional arrays.
  • Array projection is the nested application of JSON selectors over the elements of arrays for both one-dimensional and multidimensional arrays.
  • Arbitrary nesting of arrays within JSON objects, including splitting array dimensions across the JSON path.
  • Loading individual array elements, which can also be arrays.

Mapping a transformation function over array elements is not supported.

One-Dimensional Arrays

The next few examples load the same data into the my_table table both directly and using array projection.

SQL


Load data directly from a JSON array of strings. The JSON data contains this information.

JSON

SQL


Output

SQL


Array Projection

Array projection is the application of the specified selector over all elements in an array, similar to a for loop.

Load data using array projection into the same table using this JSON data.

JSON

SQL


Output

SQL


The output of both loading operations is the same.

Repeat this example with data in an array that contains nested objects. The JSON file contains this information.

JSON

SQL


Output

SQL


The output is still the same.

Multi-Dimensional Arrays

Multi-dimensional arrays work similarly to one-dimensional arrays.

These examples load the same data into the my_table table both directly, and then using array projection.

SQL


Load data directly from a JSON array of strings. The JSON file contains this information.

JSON

SQL


Load data using array projection. The JSON file contains this information.

JSON


$a[][][].b.name accesses each object such as {"b": {"name": "John", "hometown": "Chicago"}} and uses b.name to apply this nested attribute selector on all of the elements in the inner arrays.

SQL


Output

SQL


The output of both loading operations is the same.

Multi-Dimensional Arrays With Dimensions Split Across the JSON Path

To load a two-dimensional array, for example, use a one-dimensional array that contains JSON objects, each of which contains another one-dimensional array. Load the relevant parts of this JSON object into a two-dimensional array.

In this example, the column has VARCHAR[][] type and the SELECT expression is $x.a[].b.c[].name. The loaded data is a two-dimensional array of names.

The JSON file contains this information.

JSON

SQL

s


Output

SQL


The output is a two-dimensional array of names.

Extract Data from Individual Array Elements

Extract data from an element of a JSON array. That element can also be an array.

In this example, extract the first element of a, which is an array. JSON array indexes start at 1 so that all array indexing in Pipelines are consistent with the SQL standard.

The JSON file contains this information.

JSON

SQL

SQL


Output

SQL


NULL and Empty Handling for JSON Arrays

The Ocient System handles NULL, empty, and missing values the same way for arrays as for scalars. The system converts a value that is NULL, empty, or missing to NULL and loads it as NULL.

Provide an explicit default in the pipeline or use the COLUMN_DEFAULT_IF_NULL option to accept the configured column default instead of attempting to load NULL.

Tuples of Scalars

Data pipeline loading supports tuples of scalars in the JSON source.

Tuples of arrays and tuples of tuples are not supported.

The specification of tuple elements using curly braces is only supported for the basic case (i.e. $a.{b,c}). More complex selectors are not supported.

The recommended way of specifying tuples is to use an individual JSON selector for each tuple element, such as $a.b or $a.c.

You can apply functions to tuple elements.

Basic Tuple Construction

Load two strings into a tuple.

JSON

SQL

SQL


Output

SQL


Applying Functions (Transformations) To Tuple Elements

In this example, load a tuple where elements have different types, VARCHAR and TIMESTAMP, and where the second element has to be converted from a JSON string into an Ocient timestamp. The JSON file contains this information.

JSON

SQL

SQL


Output

SQL


The output table has one row of tuples.

NULL and Empty Handling for JSON Tuples

All the rules for handling NULL, empty, and missing elements that apply to scalars and arrays also apply to tuples. If any part of the selector is NULL, empty, or missing, data pipeline loading converts that value to NULL.

Additionally, because you can apply functions to tuple elements (and not array elements), you can use the NULL_IF function to convert a tuple element to NULL. For example, tuple<<char,varchar>>( $a.name, NULL_IF($a.hometown, 'N/A') ) indicates to the pipeline that the string 'N/A' signifies NULL for the hometown element but not for the name element.

Special Characters in JSON Keys

This example shows loading data where the JSON key names contain a special character.

Create a table for the load using the customer_name column.

SQL


Load a file with this data. The JSON key name contains the - special character.

JSON


Use double quotes for the selector $first-name in the CREATE PIPELINE SQL statement.

SQL


Related Links