Skip to main content
The data pipeline functionality enables you to load data in the structured JSON data format. Use these examples to see how you can access the structure in different ways. For an overview of loading JSON data, see:

Scalar Extraction from JSON

These examples load a scalar into the VARCHAR column named customer_name. Use the CREATE TABLE SQL statement to create the table with this column.
SQL
CREATE TABLE my_schema.my_table (customer_name VARCHAR NOT NULL);
For a JSON file with the value at the top level, the selector can refer to the key directly. Load a file with this data.
JSON
{"name": "John"}
Use the selector $name.
SQL
CREATE PIPELINE my_pipeline
	SOURCE filesystem
		FILTER 'data.json'
	EXTRACT
		FORMAT json
	INTO my_schema.my_table
    SELECT
        $name AS customer_name;
List all top-level JSON keys used in the SELECT statement in JSON_FIELDS of the CREATE PIPELINE SQL statement. For a JSON file that has a nested value, you need a complex selector.
JSON
{"a": {"b": {"name": "John"}}}
In this case, the same CREATE PIPELINE statement for this file has $a.b.name instead of $name.
SQL
CREATE PIPELINE my_pipeline
	SOURCE filesystem
		FILTER 'data.json'
	EXTRACT
		FORMAT json
	INTO my_schema.my_table
    SELECT
        $a.b.name AS customer_name;

NULL and Empty Handling for JSON Scalars

The System handles all JSON NULL, empty, and missing values in the same way. The system loads these values as NULL. These values fail to load into non-nullable columns. Provide an explicit default in the pipeline or use the COLUMN_DEFAULT_IF_NULL option to accept the configured column default instead of attempting to load NULL values. The JSON data contains this information.
JSON
{"a": {"b": {"name": "John"}}}
{"blah": {}}
In this case, the output table has two rows because there are two records in the JSON source file.
SQL
CREATE PIPELINE my_pipeline
    SOURCE filesystem
		FILTER 'data.json'
	EXTRACT
		FORMAT json
    INTO my_schema.my_table
    SELECT
        $a.b.name AS customer_name;
Output
SQL
Ocient> select * from my_schema.my_table;
name
---------------------------------------------
John
NULL
Fetched 2 rows
When you load all these JSON files with the same pipeline definition, the Ocient System creates a table with the same two rows. The middle value is empty.
JSON
{"a": {"b": {"name": "John"}}}
{"a": {"b": {}}}
The middle value is NULL.
JSON
{"a": {"b": {"name": "John"}}}
{"a": {"b": null}
The leaf value is NULL.
JSON
{"a": {"b": {"name": "John"}}}
{"a": {"b": {"name": null}}}

Array Extraction from JSON

Supported array extraction scenarios are:
  • One-dimensional arrays.
  • Multi-dimensional arrays.
  • Array projection is the nested application of JSON selectors over the elements of arrays for both one-dimensional and multidimensional arrays.
  • Arbitrary nesting of arrays within JSON objects, including splitting array dimensions across the JSON path.
  • Loading individual array elements, which can also be arrays.
Mapping a transformation function over array elements is not supported.

One-Dimensional Arrays

The next few examples load the same data into the my_table table both directly and using array projection.
SQL
CREATE TABLE my_schema.my_table (customer_names VARCHAR[] NOT NULL);
Load data directly from a JSON array of strings. The JSON data contains this information.
JSON
{
  "a": {
    "b": {
      "names": ["John", "Bob", "Rajiv"]
    }
  }
}
SQL
CREATE PIPELINE my_pipeline
    SOURCE filesystem
		FILTER 'data.json'
 	EXTRACT
		FORMAT json
    INTO my_schema.my_table
    SELECT
        $a.b.names[] AS customer_names;
Output
SQL
Ocient> select * from my_schema.my_table;
customer_names
--------------------------------------------------------------------------------
[John, Bob, Rajiv]
Fetched 1 row

Array Projection

Array projection is the application of the specified selector over all elements in an array, similar to a for loop. Load data using array projection into the same table using this JSON data.
JSON
{ "a": [ {"name": "John", "hometown": "Chicago"},
         {"name": "Bob", "hometown": "Cucamonga"},
         {"name": "Rajiv", "hometown": "Glendale Heights"} ] }
SQL
CREATE PIPELINE my_pipeline
    SOURCE filesystem
		FILTER 'data.json'
 	EXTRACT
		FORMAT json
    INTO my_schema.my_table
    SELECT
        $a[].name AS customer_names;
Output
SQL
Ocient> select * from my_schema.my_table;
customer_names
--------------------------------------------------------------------------------
[John, Bob, Rajiv]
Fetched 1 row
The output of both loading operations is the same. Repeat this example with data in an array that contains nested objects. The JSON file contains this information.
JSON
{ "a": [ {"b": {"name": "John", "hometown": "Chicago"}},
         {"b": {"name": "Bob", "hometown": "Cucamonga"}},
         {"b": {"name": "Rajiv", "hometown": "Glendale Heights"}} ] }
SQL
CREATE PIPELINE my_pipeline
    SOURCE filesystem
		FILTER 'data.json'
 	EXTRACT
		FORMAT json
    INTO my_schema.my_table
    SELECT
        $a[].b.name AS customer_names;
Output
SQL
Ocient> select * from my_schema.my_table;
customer_names
--------------------------------------------------------------------------------
[John, Bob, Rajiv]
Fetched 1 row
The output is still the same.

Multi-Dimensional Arrays

Multi-dimensional arrays work similarly to one-dimensional arrays. These examples load the same data into the my_table table both directly, and then using array projection.
SQL
CREATE TABLE my_schema.my_table (customer_names VARCHAR[][][] NOT NULL);
Load data directly from a JSON array of strings. The JSON file contains this information.
JSON
{ "a": {
    "b": {
      "names": [
        [["John", "Bob", "Rajiv"], ["Anna", "Hanna", "Vanna", "Rosanna"]],
        [["Masha", "Natasha", "Sasha"], ["Chad", "Thad", "Brad"]]]    }
} }
SQL
CREATE PIPELINE my_pipeline
    SOURCE filesystem
		FILTER 'data.json'
 	EXTRACT
		FORMAT json
    INTO my_schema.my_table
    SELECT
        $a.b.names[][][] AS customer_names;
Load data using array projection. The JSON file contains this information.
JSON
{
    "a": [
        [
            [
                {"b": {"name": "John", "hometown": "Chicago"}},
                {"b": {"name": "Bob", "hometown": "Cucamonga"}},
                {"b": {"name": "Rajiv", "hometown": "Glendale Heights"}},
            ],
            [
                {"b": {"name": "Anna", "hometown": "Chicago"}},
                {"b": {"name": "Hanna", "hometown": "Cucamonga"}},
                {"b": {"name": "Vanna", "hometown": "Glendale Heights"}},
                {"b": {"name": "Rosanna", "hometown": "Mahwah"}},
            ],
        ],
        [
            [
                {"b": {"name": "Masha", "hometown": "Boston"}},
                {"b": {"name": "Natasha", "hometown": "Cambridge"}},
                {"b": {"name": "Sasha", "hometown": "Summerville"}},
            ],
            [
                {"b": {"name": "Chad", "hometown": "Champaign"}},
                {"b": {"name": "Thad", "hometown": "Urbana"}},
                {"b": {"name": "Brad", "hometown": "Mahomet"}},
            ],
        ],
    ]
}
$a[][][].b.name accesses each object, such as {"b": {"name": "John", "hometown": "Chicago"}}, and uses b.name to apply this nested attribute selector on all of the elements in the inner arrays.
SQL
CREATE PIPELINE my_pipeline
    SOURCE filesystem
		FILTER 'data.json'
 	EXTRACT
		FORMAT json
    INTO my_schema.my_table
    SELECT
        $a[][][].b.name AS customer_names;
Output
SQL
Ocient> select * from my_schema.my_table;
customer_names
--------------------------------------------------------------------------------
[[[John, Bob, Rajiv], [Anna, Hanna, Vanna, Rosanna]], [[Masha, Natasha, Sasha],
[Chad, Thad, Brad]]]
Fetched 1 row
The output of both loading operations is the same.

Multi-Dimensional Arrays With Dimensions Split Across the JSON Path

To load a two-dimensional array, for example, use a one-dimensional array that contains JSON objects, each of which contains another one-dimensional array. Load the relevant parts of this JSON object into a two-dimensional array. In this example, the column has VARCHAR[][] type and the SELECT expression is $x.a[].b.c[].name. The loaded data is a two-dimensional array of names. The JSON file contains this information.
JSON
{
    "x": {
        "a": [
            {
                "b": {
                    "c": [
                        {"name": "John", "hometown": "Chicago"},
                        {"name": "Bob", "hometown": "Cucamonga"},
                        {"name": "Rajiv", "hometown": "Glendale Heights"},
                    ]
                }
            },
            {
                "b": {
                    "c": [
                        {"name": "Anna", "hometown": "Chicago"},
                        {"name": "Hanna", "hometown": "Cucamonga"},
                        {"name": "Vanna", "hometown": "Glendale Heights"},
                        {"name": "Rosanna", "hometown": "Mahwah"},
                    ]
                }
            },
        ]
    }
}
SQL
CREATE TABLE my_schema.my_table (customer_names VARCHAR[][]);
CREATE PIPELINE my_pipeline
    SOURCE filesystem
		FILTER 'data.json'
 	EXTRACT
		FORMAT json
    INTO my_schema.my_table
    SELECT
        $x.a[].b.c[].name AS customer_names;
Output
SQL
Ocient> select * from my_schema.my_table;
customer_names
--------------------------------------------------------------------------------
[[John, Bob, Rajiv], [Anna, Hanna, Vanna, Rosanna]]
Fetched 1 row
The output is a two-dimensional array of names.

Extract Data from Individual Array Elements

Extract data from an element of a JSON array. That element can also be an array. In this example, extract the first element of a, which is an array. JSON array indexes start at 1 so that all array indexing in Pipelines are consistent with the SQL standard. The JSON file contains this information.
JSON
{
    "a": [
        {"b": [{"name": "John", "hometown": "Chicago"}]},
        {
            "b": [
                {"name": "Bob", "hometown": "Cucamonga"},
                {"name": "Rajiv", "hometown": "Glendale Heights"},
            ]
        },
        {"b": [{"name": "Rajiv", "hometown": "Glendale Heights"}]},
    ]
}
SQL
CREATE TABLE my_schema.my_table (customer_names VARCHAR[]);
SQL
CREATE PIPELINE my_pipeline
    SOURCE filesystem
		FILTER 'data.json'
 	EXTRACT
		FORMAT json
    INTO my_schema.my_table
    SELECT
        $a[2].b[].name AS customer_names;
Output
SQL
Ocient> select * from my_table.my_schema;
customer_names
--------------------------------------------------------------------------------
[Bob, Rajiv]
Fetched 1 row

NULL and Empty Handling for JSON Arrays

The Ocient System handles NULL, empty, and missing values the same way for arrays as for scalars. The system converts a value that is NULL, empty, or missing to NULL and loads it as NULL. Provide an explicit default in the pipeline or use the COLUMN_DEFAULT_IF_NULL option to accept the configured column default instead of attempting to load NULL.

Tuples of Scalars

Data pipeline loading supports tuples of scalars in the JSON source. The specification of tuple elements using curly braces is only supported for the basic case (i.e. $a.{b,c}). More complex selectors are not supported. The recommended way of specifying tuples is to use an individual JSON selector for each tuple element, such as $a.b or $a.c. You can apply functions to tuple elements.

Tuple Construction

This example loads two strings into a simple tuple.
JSON
{"users_hometowns": {"a": {"name": "Bob", "hometown": "Cucamonga"}}}
SQL
CREATE TABLE my_schema.my_table (users_hometowns TUPLE<<VARCHAR,VARCHAR>>);
SQL
CREATE PIPELINE my_pipeline
...
SELECT
    tuple<<char,char>>($users_hometowns.a.name, $users_hometowns.a.hometown)
    AS users_hometowns;
Output
SQL
Ocient> select * from my_schema.my_table;
users_hometowns
--------------------------------------------------------------------------------
<<Bob, Cucamonga>>
Fetched 1 row

Advanced Tuple Construction

Use these examples to explore how to construct more complex tuples with nested objects and arrays. Select Fields from a Nested Object into a Tuple When the target fields are nested inside a named object, prefix the braces with the path to that object. Selector:
Text
$a.{b, c}
Transform equivalent:
Text
tuple($a.b, $a.c)
This data pipeline constructs a TUPLE<<VARCHAR, INT>> from fields b and c inside object a.
SQL
PREVIEW PIPELINE test_pl
    SOURCE INLINE '{"a": {"b": "two", "c": 2}}'
    EXTRACT FORMAT JSON
    INTO test.tgt_tbl
    SELECT $a.{b, c} AS tgt_col4;
Output
Text
tgt_col4
--------------------------------------------------------------------------------
("two", 2)
Select an Array of Tuples from an Array of Objects Appending [] to the parent path maps the tuple selector over every element in the array, producing an array of tuples. Selector:
Text
$a[].{b, c}
Transform equivalent:
Text
zip_with($a[].b, $a[].c, tuple)
This data pipeline produces a TUPLE<<INT, VARCHAR>>[] from an array of objects, each containing fields b and c.
SQL
PREVIEW PIPELINE test_pl
    SOURCE INLINE '{"a": [{"b": 1, "c": "John"},
        {"b": 2, "c": "Bob"},
        {"b": 3, "c": "Rajiv"}]}'
    EXTRACT FORMAT JSON
    INTO test.tgt_tbl
    SELECT $a[].{b, c} AS tgt_col5;
Output
Text
tgt_col5
--------------------------------------------------------------------------------
[(1, "John"),(2, "Bob"),(3, "Rajiv")]
Select a Tuple Containing an Array Element Individual fields inside braces can have their own brackets [] to indicate that the field is an array. This selector produces a tuple with one element, an array, and the other a scalar. Selector:
Text
$a.{b[], c}
Transform equivalent:
Text
tuple($a.b[], $a.c)
This data pipeline constructs a TUPLE<<VARCHAR[], INT>> where the first element is an array of strings and the second element is an integer.
SQL
PREVIEW PIPELINE test_pl
    SOURCE INLINE '{"a": {"b": ["John", "Bob", "Rajiv"], "c": 3}}'
    EXTRACT FORMAT JSON
    INTO test.tgt_tbl
    SELECT $a.{b[], c} AS tgt_col6;
Output
Text
tgt_col6
--------------------------------------------------------------------------------
([John,Bob,Rajiv], 3)
Select an Array of Tuples with an Array Element Combining brackets [] on the parent path with [] on a child field produces an array of tuples where each tuple contains an array element and a scalar element. Selector:
Text
$a[].{b[], c}
Transform equivalent:
Text
zip_with($a[].b[], $a[].c, tuple)
This data pipeline produces a TUPLE<<VARCHAR[], INT>>[] from an array of objects that each contain a nested array.
SQL
PREVIEW PIPELINE test_pl
    SOURCE INLINE '{"a": [{"b": ["John", "Bob", "Rajiv"], "c": 4},
        {"b": ["Rajiv", "Bob", "John"], "c": 5}]}'
    EXTRACT FORMAT JSON
    INTO test.tgt_tbl
    SELECT $a[].{b[], c} AS tgt_col7;
Output
Text
tgt_col7
--------------------------------------------------------------------------------
[([John,Bob,Rajiv], 4),([Rajiv,Bob,John], 5)]
Select a Tuple with Nested Tuple Elements Nesting tuple selectors produces tuples where the elements are also tuples. Selector:
Text
$.{a.{b, c}, d.{e, f}}
Transform equivalent:
Text
tuple(tuple($a.b, $a.c), tuple($d.e, $d.f))
This data pipeline constructs a TUPLE<<TUPLE<<VARCHAR, INT>>, TUPLE<<INT, VARCHAR>>>> from two nested objects.
SQL
PREVIEW PIPELINE test_pl
    SOURCE INLINE '{"a": {"b": "John", "c": 6},
        "d": {"f": "Bob", "e": 7}}'
    EXTRACT FORMAT JSON
    INTO test.tgt_tbl
    SELECT $.{a.{b, c}, d.{e, f}} AS tgt_col8;
Output
Text
tgt_col8
--------------------------------------------------------------------------------
(("John", 6), ("Bob", 7))
Select an Array of Tuples with a Nested Tuple Element A dot-brace group preceded by . (with no field name) creates an inline tuple element within an outer tuple. When you combine such an element with brackets [] on the parent path, this selector produces an array of tuples where one element is a tuple. Selector:
Text
$a[].{b, .{c[1], c[2], c[3]}}
Transform equivalent:
Text
zip_with($a[].b, zip_with($a[].c[1], $a[].c[2], $a[].c[3], tuple), tuple)
This data pipeline produces a TUPLE<<INT, TUPLE<<VARCHAR, VARCHAR, VARCHAR>>>>[] by selecting a scalar field and an inline tuple of indexed array elements.
SQL
PREVIEW PIPELINE test_pl
    SOURCE INLINE '{"a": [{"c": ["John", "Bob", "Rajiv"], "b": 4},
        {"c": ["Rajiv", "Bob", "John"], "b": 5}]}'
    EXTRACT FORMAT JSON
    INTO test.tgt_tbl
    SELECT $a[].{b, .{c[1], c[2], c[3]}} AS tgt_col10;
Output
Text
tgt_col10
--------------------------------------------------------------------------------
[(4, ("John", "Bob", "Rajiv")),(5, ("Rajiv", "Bob", "John"))]

Applying Functions (Transformations) To Tuple Elements

In this example, load a tuple where elements have different types, VARCHAR and TIMESTAMP, and where the second element has to be converted from a JSON string into an Ocient timestamp. The JSON file contains this information.
JSON
{"event_timestamp": {"event": "my_event", "timestamp": "1980-02-03 15:16:17.12345"}}
SQL
CREATE TABLE my_schema.my_table (event_timestamp TUPLE<<VARCHAR,TIMESTAMP>> NOT NULL);
SQL
CREATE PIPELINE my_pipeline
...
SELECT
    tuple<<char,timestamp>>($event_timestamp.event,
    TO_TIMESTAMP(char($event_timestamp.timestamp), 'yyyy-MM-dd HH:mm:ss.SSSSSS'))
    AS event_timestamp;
Output
SQL
Ocient> select * from  my_schema.my_table;
event_timestamp
--------------------------------------------------------------------------------
<<my_event, 1980-02-03 15:16:17.123450000>>
Fetched 1 row
The output table has one row of tuples.

NULL and Empty Handling for JSON Tuples

All the rules for handling NULL, empty, and missing elements that apply to scalars and arrays also apply to tuples. If any part of the selector is NULL, empty, or missing, data pipeline loading converts that value to NULL. Additionally, because you can apply functions to tuple elements (and not array elements), you can use the NULL_IF function to convert a tuple element to NULL. For example, tuple<<char,varchar>>( $a.name, NULL_IF($a.hometown, 'N/A') ) indicates to the pipeline that the string 'N/A' signifies NULL for the hometown element but not for the name element.

Special Characters in JSON Keys

This example shows loading data where the JSON key names contain a special character. Create a table for the load using the customer_name column.
SQL
CREATE TABLE my_schema.my_table (customer_name VARCHAR NOT NULL);
Load a file with this data. The JSON key name contains the - special character.
JSON
{"first-name": "John"}
Use double quotes for the selector $first-name in the CREATE PIPELINE SQL statement.
SQL
CREATE PIPELINE my_pipeline
	SOURCE filesystem
		FILTER 'data.json'
	EXTRACT
		FORMAT json
	INTO my_schema.my_table
    SELECT
        $"first-name" AS customer_name;

Array Extraction Operations

Other operations exist in the source field selector syntax to flatten or compact arrays while extracting the data.

Flatten Arrays

Flattening means that the extracted array should not increase the rank, and instead, sub-arrays should be concatenated. To flatten an array, use the underscore character in the array selector (e.g., $data.array[_]). Example This JSON data contains a nested array with two sub-arrays having data [1,2,3] and [4,5,6].
JSON
{
  a: {
    b: [
      {c: [1,2,3] },
      {c: [4,5,6] }
    ]
  }
}
Flatten the nested array.
SQL
a.b[_].c[]
Output: [1,2,3,4,5,6] This selected array has rank 1, and not 2. Contrast this with the non-flattened selector a.b[].c[], which would have the output [[1,2,3],[4,5,6]]. You can also flatten arrays with the FLATTEN function. For details, see Array Data Transformation Functions. The functionality of the operator and function is equivalent.

Compact Arrays

Compaction eliminates NULL values from the output data. The exact bit pattern of NULL in the source data is source-type dependent. For JSON, null is a literal keyword that is unambiguous. For CSV or other less-defined types, the configuration determines which exact bits equate to NULL. To compact an array, use the exclamation mark character in the array selector(e.g., $data.array[!]). Example This JSON data contains a nested array with four NULL values.
JSON
{
  a: {
    b: {
      c: [ 1,null,2,null,3,null,null]
    }
  }
}
Compact the nested array.
SQL
a.b.c[!]
Output: [1,2,3] You can also compact arrays with the ARRAY_COMPACT function. For details, see Array Data Transformation Functions. The functionality of the operator and function is equivalent. Data Pipeline Load of Parquet Data from S3 Data Pipeline Load of Parquet Data from S3 Data Types for Data Pipelines Data Formats for Data Pipelines Monitor Data Pipelines
Last modified on May 27, 2026