Data Types for Data Pipelines
When you load data using a data pipeline, you reference extracted data using a special syntax named a source field reference. These source field references can behave differently depending on the data format. It is important to understand how to reference source field data and also how assigns the type of data you reference.
You reference source fields in a pipeline using a named reference such as $my_field or a numeric field reference such as $1. For some data formats like JSON and , Ocient supports nested data access such as $my_field.sub_field.
Learn more about what source field references are supported for each data format in Data Formats for Data Pipelines.
The Ocient System treats references to source fields (e.g., $my_field) differently depending on the data format FORMAT specified in the data pipeline.
Text-Based Data Formats
For text-based formats such as DELIMITED and JSON, the Ocient System treats source fields as VARCHAR. For JSON data, the system treats source fields strictly as VARCHAR data even if the field value is represented logically in JSON as Boolean, integer, NULL, or double. Learn more about text-based formats in Load JSON Data and Load Delimited and CSV Data.
Binary-Based Data Format
For the binary-based format BINARY, Ocient treats source fields as the BINARY data type. Learn more about binary formats in Load Binary Data.
Parquet-Based Data Format
For the Parquet-based format PARQUET, Ocient encodes information about the data type of the fields during the load.
The data pipeline functionality supports all column data types except:
- Tuples of Tuples
- Tuples of Arrays
ST_POINT is not supported in a 3-coordinate format.
When you load data into a column type, in many cases, Ocient automatically casts the source data to the target column. However, in some cases, you might need to cast data to the target column type explicitly. Learn more about supported casting functions in Scalar Transformation Functions and Casting.
Loading pipelines apply automatic casting to the final values of your transformations and source fields where the data type does not match the target column type. The casting function that the Ocient System applies varies based on the SQL type of the data that results from your transformation and the SQL type of the target column.
The system applies casting automatically in cases to resolve conflicts between data types:
- Promotion of a type to expand to a wider data type (e.g., SMALLINT to BIGINT)
- Implicit casting from one family of data type to another (e.g., VARCHAR to TIMESTAMP)
- Ocient supports implicit casting of all simple data types to VARCHAR.
Auto-casting in Parquet does not support the automatic conversion to VARCHAR columns. You must explicitly cast data to the CHAR data type when you convert Parquet data that is not string data to a VARCHAR column or VARCHAR function argument.
Automatic casting does not apply in some situations where data loss might occur:
- Implicit downcasting to a narrower data type (e.g., BIGINT to SMALLINT)
Where automatic casting is not supported, you can explicitly cast data or use built-in transformation functions according to your requirements. See Transform Data in Data Pipelines for more details.
VARCHAR Source Data
The Ocient System treats the source field reference from JSON and DELIMITED formatted data as VARCHAR regardless of the logical type in the source.
Ocient supports the implicit casting of VARCHAR to all simple data types. These types are not supported:
- Geospatial types
- Arrays
- Tuples
- Matrixes
BINARY Source Data
The system treats the source field reference from BINARY formatted data as raw bytes with automatic casting functions tailored to the BINARY data type.
Ocient supports implicit casting of BINARY to:
- TINYINT, SMALLINT, INT, BIGINT
- FLOAT, DOUBLE
- TIMESTAMP, DATE, TIME (as VARCHAR data)
For BINARY or VARCHAR to BOOL:
- You can represent true as: 't', 'true', 'yes', '1', 'y', 'on'
- You can represent false as: 'f', 'false', 'no', '0', 'n', 'off'
For BINARY or VARCHAR to TIMESTAMP, DATE, and TIME, the automatic casting assumes that the character data has the format:
- TIMESTAMP — YYYY-MM-DD HH:mm:ss[.SSSSSSSSS]
- DATE — YYYY-MM-DD
- TIME — HH:mm:ss
Other Implicit Casts or Downcasts
The system also supports some other implicit casts that are not type promotions:
- TIMESTAMP data implicitly downcasts to DATE and TIME columns, truncating the unused portion.
- ℹ️This option is available in version 25 or later.
- All simple types implicitly cast to VARCHAR columns except:
- HASH or BINARY
- DECIMAL
- IP or IPV4
- ARRAY, TUPLE, MATRIX
For geospatial data types, see Load Geospatial Data in Data Pipelines.
The data type promotion list describes how the Ocient System can automatically promote the specified type to a target column type. The system achieves the promotion using standard casting functions (e.g., BIGINT(SMALLINT)). See Implicit Casting Support List for implicit casting, for example casting to VARCHAR.
Source Data Type | Automatic Casting Precedence for Target Column Type |
---|---|
BYTE or TINYINT | BYTE TINYINT SMALLINT INT BIGINT FLOAT DOUBLE |
SMALLINT | SMALLINT INT BIGINT FLOAT DOUBLE |
INT | INT BIGINT FLOAT DOUBLE |
BIGINT | BIGINT DOUBLE TIMESTAMP TIME |
FLOAT | FLOAT DOUBLE |
DOUBLE | DOUBLE |
DECIMAL | DECIMAL |
BOOL | BOOLEAN |
UUID | UUID |
DATE | DATE TIMESTAMP |
TIME | TIME |
TIMESTAMP | TIMESTAMP See Implicit Casting Support List for supported downcasting. |
IPV4 | IPV4 |
IP | IP |
BINARY | BINARY See Implicit Casting Support List for supported cross-casting. |
ARRAY | ARRAY |
TUPLE | TUPLE |
VARCHAR | VARCHAR See Implicit Casting Support List for supported cross-casting. |
ST_POINT | ST_POINT |
ST_LINESTRING | ST_LINESTRING |
ST_POLYGON | ST_POLYGON |
- BIGINT to TIME automatic casting assumes a time value in milliseconds.
- BIGINT to TIMESTAMP automatic casting assumes milliseconds after the epoch. (January 1, 1970 at midnight UTC)
- For complex types, the system applies the type precedence for automatic casting to the elements in the type.
- Automatic casting of integral types to DOUBLE and FLOAT are potentially lossy due to limits in precision of floating point representations.
- ℹ️This option is available in version 25 or later.
The system treats BYTE and TINYINT aliases identically.
Load Data