JSON and DELIMITED, the Ocient System performs no preemptive casting on the data when using a source field selector.
For example, with the JSON string { "my_field": 1234 }, the selector $my_field returns the string "1234" not the integer 1234.
When you use transformation functions, keep in mind that the Ocient System treats all data in the JSON and DELIMITED formats as text data.
While the Ocient System sends data you select to a final target column, the system automatically casts the data in the final step to ensure that the data is compatible with the target column type. See Data Types for Data Pipelines for supported automatic conversion rules.
Format-specific differences also appear in the pipelines.
Load ASN.1 Data
You can load data in ASN.1 (Abstract Syntax Notation One) format from binary-encoded ASN.1 files using DER-encoded or BER-encoded files. ASN.1 provides a flexible, schema-driven format commonly used in telecommunications, security, and standardized protocols. This format allows you to extract structured records and map them to relational tables using SQL. The Ocient System requires.ber and .der files to contain one or more concatenated DER-encoded or BER-encoded values with the specified record type. The system decodes each record independently and maps it into a record.
ASN.1 Type Mapping
The system automatically converts all decoded ASN.1 fields to their JSON-equivalent representations.The ASN.1 schema must consistently use implicit or explicit tagging. You must specify clear tagging so the system can resolve field names during extraction.
| ASN.1 Type | JSON Mapping | Description |
|---|---|---|
INTEGER | Number | The number, specified as an integer. Example: 5 |
BOOLEAN | Boolean | The logical value, specified as a Boolean. Example: true |
VisibleString | String | The ASN.1 type assumes UTF-8 encoding. Example: "Data Analyst" |
IA5String | String | The International Alphabet number 5 (IA5) includes most of the ASCII alphabet but can contain other characters. Example: "ABCD" |
UTF8String | String | The ASN.1 type assumes UTF-8 encoding. Example: "abcd" |
OCTET STRING | Hexadecimal string | The hexadecimal string has the prefix 0x. Example: "0x4fa2" |
BIT STRING | Hexadecimal string | The hexadecimal string with preserved bit alignment has the prefix 0x. Example: "011110" |
OBJECT IDENTIFIER | String | The unique identifier for an object using dotted representation. Example: "1.2.840.113549" |
ENUMERATED | Number or symbolic string | The enumerated string, if the data resolves into a list of named items. Example: "red" |
ANY | Hexadecimal string | The hexadecimal string with raw encoded bytes of the embedded value. Example: "0x616263" |
SEQUENCE, SET | JSON object | Object with fields that are accessible by their component names. Example: WeatherRecord ::= SEQUENCE { temperature INTEGER, humidity INTEGER, cloudy BOOLEAN } |
SEQUENCE OF, SET OF | JSON array | Array of objects where the elements are fully decoded. Example: WeatherEvents ::= SEQUENCE OF WeatherRecord |
CHOICE | JSON object | An object with fields of any type. Identify the active alternative by the key. Example: WeatherReport ::= CHOICE { sunny SEQUENCE { uv\_index INTEGER }, rainy SEQUENCE { precipitation INTEGER } } |
DATE | String | Date in the format "yyyyMMdd". You must explicitly declare the date, or the system must infer it using tagging. Example: "20240518" |
TIME | String | Time in the format "HHmmss". Example: "113056" |
DATE-TIME | String | Date and time in the format "yyMMddHHmmss'Z'". This format supports only UTC. You must include 'Z' as a literal. Example: "250920101508'Z'" |
$sequence.fieldName. For arrays (such as SEQUENCE OF), use bracket notation: $sequenceOf[].fieldElement.
Default values and optional fields follow standard ASN.1 rules. If you omit a field, the system evaluates it as NULL.
ASN.1 Loading Example
Assume you have thepersonnel_records.asn file in the ASN.1 data format. The file contains the definition of a personnel record Example.PersonnelRecord. The ASN.1 file contains data for the personnel record, child, personnel name, employee number of the personnel, and the date.
Text
first_name— First nameinitial— Initial of the middle namefamily_name— Last nametitle— Job titlenumber— Employee numberdate_of_hire— Hire date
SQL
personnel_pipeline to load the personnel record into the personnel_records table using an S3 bucket. Specify the bucket, endpoint, access key identifier, secret access key, and filter options to find the ASN.1 DER-encoded file personnel_records.der in the specified file path. Use the URL file path http://cos/filepath/asn1/personnel_records.asn and record type Example.PersonnelRecord. Access the name sequence using dot notation for the first name, middle initial, and last name fields. The pipeline definition transforms the hire date to the 'yyyyMMdd' format.
SQL
Load Avro Data
The Ocient System enables you to load data in the format. You can use a streaming source with a file-based source only. Load an inline schema definition or use a schema configuration. Use a schema inference from files with embedded schemas. Field selectors in Avro follow the same format as selectors in JSON and formats.The Ocient System treats Avro selectors as lowercase. To use case-sensitive selectors, you must enclose the selector in double quotation marks. For example,
$"testSelector".Inline Schema
Specify a JSON string in the Avro schema format in theINLINE option of the schema definition of the CREATE PIPELINE SQL statement.
Inline schemas assume that all records follow the defined schema exactly. These schemas do not support schema evolution.
Schema Inference from a File
The system can infer the schema from a file that has embedded schemas. The file is a named object container file. Use theINFER_FROM option in the CREATE PIPELINE SQL statement to specify sampling one file.
Schema Evolution
When you create a data pipeline, the pipeline has a fixed target schema (specified by theSELECT clause). Individual files might have different schemas. The target schema must be backward-transitive compatible with the other schemas. Individual files might have different schemas.
If the other schemas change, the system automatically attempts to fit data into the target schema. In this case, the other schemas must be forward compatible with the target schema. The system ignores changes to any unused fields from the target schema.
Multiple schemas impact the performance of the data pipeline execution. For best performance, use a single schema for all data.
Avro Type Mapping
The Ocient System converts these Avro data types to Ocient SQL types. This table shows the respective conversions.| Avro Type | Ocient SQL Type |
|---|---|
null | NULL |
boolean | BOOLEAN |
int | INT |
long | BIGINT |
float | FLOAT |
double | DOUBLE |
bytes | VARBINARY |
string | VARCHAR |
record | Select subfields of the record using the standard pipeline selector syntax, e.g., $myrecord.subfield. |
enum | VARCHAR |
array | ARRAY |
map | Select entries of the map using the key with the standard pipeline selector syntax, e.g., $mymap.key. |
fixed | HASH |
decimal | DECIMAL |
big-decimal | DECIMAL |
uuid | UUID |
date | DATE |
time-* | TIME |
timestamp-* | TIMESTAMP |
local-timestamp-* | TIMESTAMP |
duration | The system converts duration to a record with integer subfields for months, days, and milliseconds. |
union | The system converts a union between NULL and another type to a nullable version of that type. In contrast, the system converts a union containing multiple non-NULL types to a string. |
Avro Loading Examples
Create theusers table with these columns:
id— Universally Unique IDentifier (UUID) of the userfirstname— First name of the userlastname— Last name of the userbirthyear— Year of birthgroups— List of groups where the user belongs
SQL
/data/users directory. Create the users_pipeline data pipeline for the Avro files *.avro containing user data. The schema configuration instructs the system to infer from one file using the INFER_FROM option. Access the array of strings for the groups column.
SQL
/data/users directory. Create the users_pipeline data pipeline for the Avro files *.avro containing user data. The schema configuration instructs the system to use an inline schema definition with the INLINE option. Access the array of strings for the groups column.
SQL
Load Binary Data
The Ocient System loads the binary data format using a fixed record length to split a binary stream into chunks that represent records. Each record is available in theSELECT portion of a pipeline definition using a special binary extract syntax $"[5,8]". This operates similarly to a substring function, beginning at byte 5 and taking 8 bytes from that location. The starting index is a 1-based offset, consistent with other SQL arrays and offsets. You can use this syntax to select specific bytes within a record to parse together as a unit.
Binary Selector
| Selector | Description | Examples |
|---|---|---|
$"[<start_index>,<length_in_bytes>]" | Extract Bytes — Extracts the bytes beginning at the <start_index> and returning <length_in_bytes> total bytes to extract from the start index. <start_index> is 1-based. | abc123.45xyz $"[4,6]" -> "123.45" This example illustrates the operation using ascii characters, but the Binary Selector returns BINARY data, byte arrays, that may be used in special ways by fixed width processing functions described in Binary Data Transformations. |
11 is the 1-based offset into the byte array.
BINARY data, not VARCHAR.
Special BINARY transformation functions can operate on this BINARY data. However, if you cast data to the VARCHAR type by using CHAR(), then functions like INT operate on this data as VARCHAR data, not binary data.
When you load binary data into VARCHAR columns, the Ocient System automatically converts from binary to character data using the configured CHARSET_NAME before final loading.
The Ocient System supports special transformation functions that operate uniquely on binary data. With these functions, you can convert binary representations from mainframe systems such as packed decimals, zoned decimals, big and little endian integers (signed and unsigned), and floating point values. For more details, see Binary Data Transformation Functions.
For a complete list of supported options for DELIMITED and CSV data formats, see Binary Extract Options.
Binary Loading Example
If each record in your fixed-width binary schema includes these fields, you can use theSUBSTRING function and the transforms shown in this example.
| Column | Start Index | Record Length | Source Data Type | Ocient Target Data Type |
|---|---|---|---|---|
| first_name | 1 | 20 | Character | VARCHAR |
| last_name | 21 | 20 | Character | VARCHAR |
| age | 41 | 4 | Big Endian signed 4-byte integer | INT |
| total_spent | 45 | 10 | Packed Decimal | DECIMAL(10,2) |
| user_id | 55 | 8 | Little Endian unsigned 8-byte long | BIGINT |
RECORD_LENGTH is 62. The encoding of this file is CP500 instead of the default IBM1047 code page. The CREATE PIPELINE SQL statement specifies this encoding.
SQL
- Uses the
BINARY SELECTORto extract names and load them into the respective columns. The Ocient System automatically decodes the values usingcp500and loads them into aVARCHARcolumn. An explicit cast such asCHAR($"[1, 20]") as first_nameworks equivalently. - Indicates the extraction of four bytes that represent age from bytes 41-44. The statement instructs the casting of these bytes as an integer
INT. This function uses the default endianness (big) and treats the bytes as signed. Unsigned values can overflow target columns because integral types are all signed. - Extracts the 10 bytes for
total_spentusing theBINARY SELECTOR, and converts the values using thepackeddecimal option for theDECIMALcast. The casting requires specifying the number of decimal points in the source data. In this case, there are 2 decimal points, which match the number in the target column. - Extracts the 8 bytes that represent
user_idusing theBINARY SELECTORand casts these bytes to aBIGINTwhile interpreting the bytes as unsigned with the little endian representation.
Load Delimited and CSV Data
When you load data from delimited or CSV files, the Ocient System tokenizes the data during loading. The system detects records and fields in the input data during pipeline execution. You can reference fields and use them in combination with transformation functions before the system stores values in the column of a target table. Files must be located in these allowed directories:/tmpdirectory- The temporary directory you configure using the
streamloader.extractorEngineParameters.tempDirconfiguration option (default path is/var/opt/ocient/tmp). - The directory list specified by the
streamloader.extractorEngineParameters.configurationOption.filesystem.access.directoriesconfiguration option (default is an empty list).
/etc/bin/sbin/lib/lib64/usr/boot/proc/sys/run/root/var/lib/var/log/var/run/var/cache
$. To maintain consistency with SQL array semantics, the field indexes start at 1.
Reference the first field of tokenized source records for the DELIMITED and CSV formats as $1.
For a complete list of supported options for DELIMITED and CSV data formats, see Delimited and CSV Extract Options.
Delimited Loading Example
Use this example delimited data.Text
| Field Reference | Value |
|---|---|
| $1 | iphone |
| $2 | 60607 |
| $3 | viewed |
| $4 | 502 |
| $5 | 293.99 |
| $6\[] | \[‘shopping’,‘news’] |
DELIMITED data format, this CREATE PIPELINE statement specifies the | character for the field delimiter. This statement loads data into AWS S3. The SELECT statement uses fields 1, 2, 3, 5, and 6 of the source data. The statement specifies that the system should not load field 4 to the target table. Field 6 is an array of data matching the default array settings for delimited data. You can indicate this with the array brackets like $6[] to load into a CHAR[] typed column.
The outer casting functions in this example are optional and shown for completeness. If they are omitted, the pipeline automatically casts the source fields to the target column type.
SQL
Load JSON Data
The data pipeline syntax enables the load of JSON data, including nested scalars, arrays, and pointsST_POINT.
Supported JSON Selectors
JSON selectors consist of$ followed by a dot-separated list of JSON keys. If a key refers to an array, it is followed by a set of brackets [] to correspond to its dimensionality. If the square brackets contain an index, like [1], then the selector refers to an array element.
The Ocient System treats JSON selectors as lowercase. To use case-sensitive selectors, you must enclose the selector in double quotation marks. For example,
$"testSelector". With case-sensitive selectors having multiple JSON keys, each key needs double quotation marks. For example, $"testData"."Responses"."SuccessResponse".SELECT), you must enclose such selectors in double quotation marks. For example, if you have a JSON document { "test-field": 123 }, then the selector for the query should be $"test-field".
The Ocient System does not support identifiers with a backslash as the last character in the key name.
This table shows the selector and provides its description. The cells in the last column of the table show an example for each selector. First, the cell shows example data in JSON format. Then, the cell shows the use of the selector and its output after the arrow.
| Selector | Description | Examples |
|---|---|---|
$a | Scalar — Refers to key a at the top of a JSON document, where the value of a is a scalar. | { "first_name": "Chris" } $first_name -> "Chris" |
$a[] | Array — Refers to key a at the top of a JSON document, where the value of a is a one-dimensional array. | { "names": ["Chris", "Joe"] } $names[] -> ["Chris","Joe"] |
$a[1] | Array Element Selection — Selects a single element from the array a at the top of a JSON document, where the value of a is a one-dimensional array. Also works on multi-dimensional arrays. Indexes start at 1. | { "names": ["Chris", "Joe"] } $names[2] -> "Joe" |
$a[][] | Multi-dimensional Array — Refers to key a at the top of a JSON document, where the value of a is a two-dimensional array. | { "coordinates": [ [87.31, 18.25],[65.22, 19.41]] } $coordinates[][] -> [[87.31,18.25],[65.22,19.41]] |
$a.b.c | Nested Selector — Drills into the value at key a, then value at key b, and refers to the value at key c, where none of the values are arrays. | { "user": { "name": "Chris" } } $user.name -> "Chris" |
$a[].b | Array Projection — A projection applies a JSON selector on all elements in an array, returning an array. Array brackets in the notation indicate which items are the arrays. The a[] is an array. The b selector is applied to each element in the array a. | { "users": [ { "name": "Chris", "orders": [ { "order_id": 1, "subtotal": 19.29 }, { "order_id": 2, "subtotal": 16.71 } ] }, { "name": "Joe", "orders": [ { "order_id": 3, "subtotal": 17.29 }, { "order_id": 4, "subtotal": 22.49 } ] } ] } $users[].name -> ["Chris", "Joe"] |
$a.b[].c.d[] | Multi-level Array Projection — Applies selectors to each element in the specified arrays and allows multiple levels of objects and arrays for the selection of an N-dimensional array of values. The values of b and d are arrays, resulting in two-dimensional arrays. The selector c.d[] is projected into the elements in the array b. | { "users": [ { "name": "Chris", "orders": [ { "order_id": 1, "subtotal": 19.29 }, { "order_id": 2, "subtotal": 16.71 } ] }, { "name": "Joe", "orders": [ { "order_id": 3, "subtotal": 17.29 }, { "order_id": 4, "subtotal": 22.49 } ] } ] } $users[].orders[].order_id -> [[1,2],[3,4]] |
$a.b[1].c.d[] | Array Slice — Selects the first element from the array b and drills into c and then the array d. JSON array selectors have an index that starts at 0. | { "users": [ { "name": "Chris", "orders": [ { "order_id": 1, "subtotal": 19.29 }, { "order_id": 2, "subtotal": 16.71 } ] }, { "name": "Joe", "orders": [ { "order_id": 3, "subtotal": 17.29 }, { "order_id": 4, "subtotal": 22.49 } ] } ] } $users[1].orders[].subtotal -> [17.29,22.49] |
$a.{b,c} | Tuple Selector — Selects the values from an object as a tuple. You must cast the values to an appropriate TUPLE data type. | { "user": { "first_name": "John", "last_name": "Doe" }} $user.{first_name,last_name} -> <<"John","Doe">> |
$a[].{b,c} | Array of Tuples Selector — Selects the values from an array of objects as an array of tuples. You must cast the values to an appropriate TUPLE[] data type. | { "users": [ { "first_name": "John", "last_name": "Doe" }, { "first_name": "Steve", "last_name": "Smith" } ]} $users[].{first_name,last_name} -> [<<"John","Doe">>,<<"Steve","Smith">>] |
$a[_] | Flatten Array Selector — Selects the values of an array and flattens the dimension indicated by the underscore by one level. | { a: { b: [ {c: [1,2,3] }, {c: [4,5,6] } ] } } $a.b[_].c -> [1,2,3,4,5,6] |
$a[!] | Compact Array Selector — Selects the values of an array and removes any NULL values from the array at this level in the array. | { a: { b: { c: [1,null,2,null,3] } } } $a.b.c[!] -> [1,2,3] |
NULL and Empty Handling for JSON Scalars
The Ocient System handles all JSON NULL, empty, and missing values in the same way. The system loads these values asNULL. These values fail to load into non-nullable columns.
Provide an explicit default in the pipeline using IF_NULL or COALESCE or use the COLUMN_DEFAULT_IF_NULL option to accept the configured column default instead of attempting to load NULL values.
NULL and Empty Handling for JSON Arrays
The Ocient System handles NULL, empty, and missing values the same way for arrays as for scalars. The system converts a value that is NULL, empty, or missing to NULL and loads it asNULL.
Provide an explicit default in the pipeline or use the COLUMN_DEFAULT_IF_NULL option to accept the configured column default instead of attempting to load NULL.
NULL and Empty Handling for JSON Tuples
All the rules for handling NULL, empty, and missing elements that apply to scalars and arrays also apply to tuples. If any part of the selector is NULL, empty, or missing, data pipeline loading converts that value toNULL.
Additionally, because you can apply functions to tuple elements (and not array elements), you can use the NULL_IF function to convert a tuple element to NULL. For example, tuple<<char,varchar>>($a.name, NULL_IF($a.hometown, 'N/A') ) indicates to the pipeline that the string 'N/A' signifies NULL for the hometown element but not for the name element.
Load Parquet Data
The data pipeline functionality enables loading Parquet files with this configuration. File Configuration- Files should have row groups of less than 128 MB. Larger row groups can impact memory usage during loading, and row groups of 512 MB can cause loading failures on 1 TB or more data sets.
- Encoding fields in a Parquet file reduces the space of the file on disk but can impact memory usage during loading. Enable encoding on fields that you expect to have less than 256 unique values and for fields that contain short strings. You do not have to encode other fields.
- You can load row groups of multiple Parquet files in parallel. For large data sets, load the data set as multiple files.
- Loading files with differing schemas is not supported.
You must select a leaf element, an array, or a tuple with your selector. This is stricter than using JSON selectors, which can directly select array fields and JSON object fields.Example: The selector must be
{"a": [1,2,3], "b": {"c": 1}}You can extract with any of the selectors in JSON: $a, $a[], $b, $b.cHowever, Parquet only allows for the selectors: $a[], $b.cThis example assumes this schema:$my_list[], which includes the array syntax.Auto-casting in Parquet does not support the automatic conversion to
VARCHAR columns. You must explicitly cast data to the CHAR data type when you convert Parquet data that is not string data to a VARCHAR column or VARCHAR function argument.The Ocient System treats Parquet selectors as lowercase. To use case-sensitive selectors, you must enclose the selector in double quotation marks. For example,
$"testSelector".Schema Evolution
The Ocient System supports schema evolution when you load a set of Parquet files. Specifically, if the pipeline selects a set of Parquet files where an individual file might have more or fewer columns than another, the system attempts to merge those schemas together to support loading without requiring you to create the pipeline again. For example, thetest_table table has three columns.
SQL
Text
EXTRACT SQL statement. You can choose to sample the first file only for its schema or sample the entire data set to merge the schemas together. This DDL statement samples on one file.
SQL
CREATE PIPELINE and START PIPELINE SQL statements. If you know that all of the Parquet files have the same schema, use this syntax.
The Ocient System does not support the case where a column within the schema changes type. For example, if col_a is an INT type in one file and a VARCHAR type in another.
The default behavior of schema evolution infers the schema from one file. Use this syntax to infer from one file.
SQL
Parquet Type Mapping
Parquet data types are separated into primitive and logical types. The Ocient System converts these types to Ocient SQL types. See these tables for the respective conversions.| Parquet Primitive Type | Ocient SQL Type |
|---|---|
BOOLEAN | BOOLEAN |
INT32 | INT |
INT64 | BIGINT |
INT96 | TIMESTAMP |
FLOAT | FLOAT |
DOUBLE | DOUBLE |
BYTE_ARRAY | VARCHAR |
FIXED_LEN_BYTE_ARRAY | VARCHAR |
| Parquet Logical Types | Ocient SQL Type |
|---|---|
STRING | VARCHAR |
UTF8 | VARCHAR |
ENUM | VARCHAR |
UUID | UUID |
INT8 | TINYINT |
INT16 | SMALLINT |
INT32 | INT |
INT64 | BIGINT |
UINT8 | SMALLINT |
UINT16 | INT |
UINT32 | BIGINT |
UINT64 | BIGINT |
DECIMAL | DECIMAL |
DATE | DATE |
TIME | TIME |
TIME_MILLIS | TIME |
TIME_MICROS | TIME |
TIMESTAMP | TIMESTAMP |
TIMESTAMP_MILLIS | TIMESTAMP |
TIMESTAMP_MICROS | TIMESTAMP |
DURATION | BIGINT |
JSON | VARCHAR |
BSON | VARCHAR |
- The
INTERVALdata type is not supported. - The
UINT64data type can overflow theBIGINTconversion. - The
DURATIONdata type conversion toBIGINTpreserves the underlying units. For example, the number of microseconds stays as microseconds in theBIGINTdata type.
| Parquet Nested Types | Ocient SQL Type |
|---|---|
LIST | TYPE[] |
TUPLE | TUPLE |
Parquet Loading Example
Create a data pipeline that loads Parquet files using an AWS S3 bucket. Specify the bucket, endpoint, access key identifier, secret access key, and filter options to find all Parquet files in the specified file path. Use theparquet_base_table table to store the loaded data. Retrieve the integer, text, floating point, double, integer, JSON, and BSON fields.
SQL
Parquet File-Partitioned Data
With Parquet, you can load file-partitioned data from Parquet files using the file path structure. Use the filter set in the file path using the naming standards. Assume files with these file paths.Shell
METADATA function with the Hive partition syntax and the specified partition key dt from the file paths.
SQL
Load XML Data
You can load data in XML format into the Ocient System. The system supports XML tags, basic elements, nested elements, and CDATA, but it does not support XML arrays and attributes.Supported XML Selectors
Like JSON selectors, you can select XML data using the$ symbol followed by a list of dot-separated JSON keys.
The system treats JSON selectors as lowercase. To use case-sensitive selectors, you must enclose the selector in double quotation marks, such as
$"testSelector". With case-sensitive selectors having multiple JSON keys, each key needs double quotation marks, such as $"testData"."Responses"."SuccessResponse".XML Loading Example
Assume an XML file with this data.XML
name— Namecity— Name of the cityzip— Zip codepersonal_ip— IP address
SQL
xml_pipeline that loads the XML file test.xml using an AWS S3 bucket. Specify the bucket, endpoint, access key identifier, secret access key, and filter options to find the XML file in the specified path. Use the example_xml table to store the loaded data. Use JSON selectors to parse the file and data in each XML tag.
The system parses the CDATA section in the note tag as the literal string Personal IP <b>127.0.0.1</b>. Use the SUBSTRING function to extract the IP address and then transform it into an IPV4 type with the IPV4 function. For details, see the SUBSTRING and IPV4 functions.
SQL

