JSON Selectors Examples in Data Pipelines
The data pipeline functionality enables you to load data in the structured JSON data format. Use these examples to see how you can access the structure in different ways. For an overview of loading JSON data, see:
These examples load a scalar into the VARCHAR column named customer_name. Use the CREATE TABLE SQL statement to create the table with this column.
For a JSON file with the value at the top level, the selector can refer to the key directly.
Load a file with this data.
Use the selector $name.
List all top-level JSON keys used in the SELECT statement in JSON_FIELDS of the CREATE PIPELINE SQL statement.
For a JSON file that has a nested value, you need a complex selector.
In this case, the same CREATE PIPELINE statement for this file has $a.b.name instead of $name.
The System handles all JSON NULL, empty, and missing values in the same way. The system loads these values as NULL. These values fail to load into non-nullable columns. Provide an explicit default in the pipeline or use the COLUMN_DEFAULT_IF_NULL option to accept the configured column default instead of attempting to load NULL values.
The JSON data contains this information.
In this case, the output table has two rows because there are two records in the JSON source file.
Output
When you load all these JSON files with the same pipeline definition, the Ocient System creates a table with the same two rows.
The middle value is empty.
The middle value is NULL.
The leaf value is NULL.
Supported array extraction scenarios are:
- One-dimensional arrays.
- Multi-dimensional arrays.
- Array projection is the nested application of JSON selectors over the elements of arrays for both one-dimensional and multidimensional arrays.
- Arbitrary nesting of arrays within JSON objects, including splitting array dimensions across the JSON path.
- Loading individual array elements, which can also be arrays.
Mapping a transformation function over array elements is not supported.
The next few examples load the same data into the my_table table both directly and using array projection.
Load data directly from a JSON array of strings. The JSON data contains this information.
Output
Array projection is the application of the specified selector over all elements in an array, similar to a for loop.
Load data using array projection into the same table using this JSON data.
Output
The output of both loading operations is the same.
Repeat this example with data in an array that contains nested objects. The JSON file contains this information.
Output
The output is still the same.
Multi-dimensional arrays work similarly to one-dimensional arrays.
These examples load the same data into the my_table table both directly, and then using array projection.
Load data directly from a JSON array of strings. The JSON file contains this information.
Load data using array projection. The JSON file contains this information.
$a[][][].b.name accesses each object such as {"b": {"name": "John", "hometown": "Chicago"}} and uses b.name to apply this nested attribute selector on all of the elements in the inner arrays.
Output
The output of both loading operations is the same.
To load a two-dimensional array, for example, use a one-dimensional array that contains JSON objects, each of which contains another one-dimensional array. Load the relevant parts of this JSON object into a two-dimensional array.
In this example, the column has VARCHAR[][] type and the SELECT expression is $x.a[].b.c[].name. The loaded data is a two-dimensional array of names.
The JSON file contains this information.
Output
The output is a two-dimensional array of names.
Extract data from an element of a JSON array. That element can also be an array.
In this example, extract the first element of a, which is an array. JSON array indexes start at 1 so that all array indexing in Pipelines are consistent with the SQL standard.
The JSON file contains this information.
Output
The Ocient System handles NULL, empty, and missing values the same way for arrays as for scalars. The system converts a value that is NULL, empty, or missing to NULL and loads it as NULL.
Provide an explicit default in the pipeline or use the COLUMN_DEFAULT_IF_NULL option to accept the configured column default instead of attempting to load NULL.
Data pipeline loading supports tuples of scalars in the JSON source.
Tuples of arrays and tuples of tuples are not supported.
The specification of tuple elements using curly braces is only supported for the basic case (i.e. $a.{b,c}). More complex selectors are not supported.
The recommended way of specifying tuples is to use an individual JSON selector for each tuple element, such as $a.b or $a.c.
You can apply functions to tuple elements.
Load two strings into a tuple.
Output
In this example, load a tuple where elements have different types, VARCHAR and TIMESTAMP, and where the second element has to be converted from a JSON string into an Ocient timestamp. The JSON file contains this information.
Output
The output table has one row of tuples.
All the rules for handling NULL, empty, and missing elements that apply to scalars and arrays also apply to tuples. If any part of the selector is NULL, empty, or missing, data pipeline loading converts that value to NULL.
Additionally, because you can apply functions to tuple elements (and not array elements), you can use the NULL_IF function to convert a tuple element to NULL. For example, tuple<<char,varchar>>( $a.name, NULL_IF($a.hometown, 'N/A') ) indicates to the pipeline that the string 'N/A' signifies NULL for the hometown element but not for the name element.
This example shows loading data where the JSON key names contain a special character.
Create a table for the load using the customer_name column.
Load a file with this data. The JSON key name contains the - special character.
Use double quotes for the selector $first-name in the CREATE PIPELINE SQL statement.