Load Data

Data Types for Data Pipelines

Source Fields

When you load data using a data pipeline, you reference extracted data using a special syntax named a source field reference. These source field references can behave differently depending on the data format. It is important to understand how to reference source field data and also how assigns the type of data you reference.

Source Field References

You reference source fields in a pipeline using a named reference such as $my_field or a numeric field reference such as $1. For some data formats like JSON and , Ocient supports nested data access such as $my_field.sub_field.

Learn more about what source field references are supported for each data format in Data Formats for Data Pipelines.

Source Field Data Types

The Ocient System treats references to source fields (e.g., $my_field) differently depending on the data format FORMAT specified in the data pipeline.

Text-Based Data Formats

For text-based formats such as DELIMITED and JSON, the Ocient System treats source fields as VARCHAR. For JSON data, the system treats source fields strictly as VARCHAR data even if the field value is represented logically in JSON as Boolean, integer, NULL, or double. Learn more about text-based formats in Load JSON Data and Load Delimited and CSV Data.

Binary-Based Data Format

For the binary-based format BINARY, Ocient treats source fields as the BINARY data type. Learn more about binary formats in Load Binary Data.

Parquet-Based Data Format

For the Parquet-based format PARQUET, Ocient encodes information about the data type of the fields during the load.

Supported Data Types

The data pipeline functionality supports all column data types except:

  • Tuples of Tuples
  • Tuples of Arrays

ST_POINT is not supported in a 3-coordinate format.

When you load data into a column type, in many cases, Ocient automatically casts the source data to the target column. However, in some cases, you might need to cast data to the target column type explicitly. Learn more about supported casting functions in Scalar Transformation Functions and Casting.

Automatic Conversion and Casting

Loading pipelines apply automatic casting to the final values of your transformations and source fields where the data type does not match the target column type. The casting function that the Ocient System applies varies based on the SQL type of the data that results from your transformation and the SQL type of the target column.

The system applies casting automatically in cases to resolve conflicts between data types:

  • Promotion of a type to expand to a wider data type (e.g., SMALLINT to BIGINT)
  • Implicit casting from one family of data type to another (e.g., VARCHAR to TIMESTAMP)
    • Ocient supports implicit casting of all simple data types to VARCHAR.

Auto-casting in Parquet does not support the automatic conversion to VARCHAR columns. You must explicitly cast data to the CHAR data type when you convert Parquet data that is not string data to a VARCHAR column or VARCHAR function argument.

Automatic casting does not apply in some situations where data loss might occur:

  • Implicit downcasting to a narrower data type (e.g., BIGINT to SMALLINT)

Where automatic casting is not supported, you can explicitly cast data or use built-in transformation functions according to your requirements. See Transform Data in Data Pipelines for more details.

Implicit Casting Support List

VARCHAR Source Data

The Ocient System treats the source field reference from JSON and DELIMITED formatted data as VARCHAR regardless of the logical type in the source.

Ocient supports the implicit casting of VARCHAR to all simple data types. These types are not supported:

  • Geospatial types
  • Arrays
  • Tuples
  • Matrixes

BINARY Source Data

The system treats the source field reference from BINARY formatted data as raw bytes with automatic casting functions tailored to the BINARY data type.

Ocient supports implicit casting of BINARY to:

  • TINYINT, SMALLINT, INT, BIGINT
  • FLOAT, DOUBLE
  • TIMESTAMP, DATE, TIME (as VARCHAR data)

For BINARY or VARCHAR to BOOL:

  • You can represent true as: 't', 'true', 'yes', '1', 'y', 'on'
  • You can represent false as: 'f', 'false', 'no', '0', 'n', 'off'

For BINARY or VARCHAR to TIMESTAMP, DATE, and TIME, the automatic casting assumes that the character data has the format:

  • TIMESTAMPYYYY-MM-DD HH:mm:ss[.SSSSSSSSS]
  • DATEYYYY-MM-DD
  • TIMEHH:mm:ss

Other Implicit Casts or Downcasts

The system also supports some other implicit casts that are not type promotions:

  • TIMESTAMP data implicitly downcasts to DATE and TIME columns, truncating the unused portion.
    • ℹ️This option is available in version 25 or later.
  • All simple types implicitly cast to VARCHAR columns except:
    • HASH or BINARY
    • DECIMAL
    • IP or IPV4
    • ARRAY, TUPLE, MATRIX

For geospatial data types, see Load Geospatial Data in Data Pipelines.

Data Type Promotion List

The data type promotion list describes how the Ocient System can automatically promote the specified type to a target column type. The system achieves the promotion using standard casting functions (e.g., BIGINT(SMALLINT)). See Implicit Casting Support List for implicit casting, for example casting to VARCHAR.

Source Data Type

Automatic Casting Precedence for Target Column Type

BYTE or TINYINT

BYTE TINYINT SMALLINT INT BIGINT

FLOAT DOUBLE

SMALLINT

SMALLINT INT BIGINT

FLOAT DOUBLE

INT

INT BIGINT

FLOAT DOUBLE

BIGINT

BIGINT

DOUBLE

TIMESTAMP TIME

FLOAT

FLOAT DOUBLE

DOUBLE

DOUBLE

DECIMAL

DECIMAL

BOOL

BOOLEAN

UUID

UUID

DATE

DATE TIMESTAMP

TIME

TIME

TIMESTAMP

TIMESTAMP

See Implicit Casting Support List for supported downcasting.

IPV4

IPV4

IP

IP

BINARY

BINARY

See Implicit Casting Support List for supported cross-casting.

ARRAY

ARRAY

TUPLE

TUPLE

VARCHAR

VARCHAR

See Implicit Casting Support List for supported cross-casting.

ST_POINT

ST_POINT

ST_LINESTRING

ST_LINESTRING

ST_POLYGON

ST_POLYGON

  • BIGINT to TIME automatic casting assumes a time value in milliseconds.
  • BIGINT to TIMESTAMP automatic casting assumes milliseconds after the epoch. (January 1, 1970 at midnight UTC)
  • For complex types, the system applies the type precedence for automatic casting to the elements in the type.
  • Automatic casting of integral types to DOUBLE and FLOAT are potentially lossy due to limits in precision of floating point representations.
    • ℹ️This option is available in version 25 or later.

The system treats BYTE and TINYINT aliases identically.

Related Links

Load Data