Data Types in Loading
After extracting from source files or topics, data can be transformed and mapped to target columns in tables.
LAT automatically binds many primitive source data types to Ocient column data types and automatically converts some source types to the target column type. Different source data types automatically bind to target columns using different conventions.
For example, a string in JSON such as '123' that loads into an INT column properly binds to the corresponding integer value 123. Similarly, an integer like 456 can automatically bind to an INT column as 456, a FLOAT column as 456.0, and a VARCHAR column as '456'. Finally, a string including Well-Known Text (WKT) representation of a POINT automatically binds into an ST_POINT column.
When you attempt to load an empty string into a target column with LAT, most data types raise an error and the record does not load into the target table. The exceptions are varchar, decimal, binary, and array column types. In those cases, the data warehouse loads these values:
- varchar: An empty string
- decimal: 0.0 (with the appropriate precision)
- binary: An empty binary blob
- array: An empty array
When you extract data from text-based formats like delimited data, the data warehouse loads empty data as an empty string unless you use the empty_as_null setting.
If you do not provide a value for a column in the target table, this table explains what value the data warehouse stores in the column. For the purposes of default value handling, these are equivalent:
- Pipeline is missing a column.
- Pipeline includes a column but the value in the JSON or the source data evaluates to null.
- Pipeline includes a column but the field is missing on the source record (e.g., a.b does not exist in a JSON record).
To transform a NULL or missing value into a default in the pipeline, use the transformation functions. For example: not_null(a.b, 'my_default')
Ocient Column Definition | Database Behavior When Column is Missing or Column Value Evaluates to NULL |
---|---|
NOT NULL with default | Default |
NOT NULL with no default | Record fails |
Nullable with default | Default |
Nullable with no default | NULL |
Refer to the Data Types for detailed information about each data type. These tables describe the automatic binding conventions of each data type.
Valid JSON Bind Types: | Number, String |
---|---|
Note: | Number and String can be either integer or floating point values and are converted to 64-bit signed integers. Decimals are dropped. '+','-' can be used at the beginning of a string to indicate sign. String overflows result in errors, while numeric overflows are silent. |
Example: | { "number_bigint": 1.0, "string_bigint": "1" } -> 1, 1 |
Valid JSON Bind Types: | String |
---|---|
Note: | Converted as UTF-8 to binary: each character in the string taking its binary representation. |
Example: | { "string_binary": "hello" } -> BINARY(01101000 01100101 01101100 01101100 01101111) |
Valid JSON Bind Types: | Boolean, String |
---|---|
Note: | For a value of true, a string must be one of (case insensitive): true, t, yes, on, y, 1. For a value of false, a string must be one of (case insensitive): false, f, no, off, n, 0. |
Example: | { "boolean": true, "string_0": "yes", "string_1": "FALSE" } -> true, true, false |
Valid JSON Bind Types: | String, Number |
---|---|
Note: | If String, must be in format "YYYY-MM-DD". If Number must be an EPOCH_DAY. Use the built-in TO_DATE function to convert a string with a different date format to an EPOCH_DAY and bind. |
Example: | { "string_date": "2021-11-04", "epoch_date": 0 } -> DATE(2021-11-04), DATE(1970-01-01) |
Valid JSON Bind Types: | Number, String |
---|---|
Valid Logical Types: | Decimal |
Note: | Number and String can be either integer or floating point values. Precision and scale for the loaded data are defined by the database table definition. The target column must support enough digits both before and after the decimal point to accommodate decimal values extracted from Fixed Width Binary or other record formats that support decimal data. If insufficient precision or scale exists in the target column, the data will fail to load. |
Example: | { "number_decimal": 1.0, "string_decimal": "1.1" } -> DECIMAL(1.0), DECIMAL(1.1) |
Valid JSON Bind Types: | Number, String |
---|---|
Note: | Number and String can be either integer or floating point values. |
Example: | { "number_double": 1.0, "string_double": "1" } -> 1.0, 1.0 |
Valid JSON Bind Types: | String |
---|---|
Note: | Converted as hexadecimal to binary: each character in the string must be a valid hexadecimal digit /[0-9a-f]+/i. |
Example: | { "hex_string": "0a1b" } -> HASH(0000 1010 0001 1011) |
Valid JSON Bind Types: | Number, String |
---|---|
Note: | Number and String can be either integer or floating point values and are converted to 32-bit signed integers. Decimals are dropped. '+','-' can be used at the beginning of a string to indicate sign. String overflows result in errors, while numeric overflows are silent. |
Example: | { "number_int": 1.0, "string_int": "1" } -> 1, 1 |
Valid JSON Bind Types: | String |
---|---|
Note: | Source data must be in the form: [0-255].[0-255].[0-255].[0-255] |
Example: | { "string_ipv4": "192.168.0.1" } -> IPV4(192.168.0.1) |
Valid JSON Bind Types: | String |
---|---|
Note: | Source string can be in IPV4 or IPV6 format. IPV6 can appear in full form as shown in this example or in condensed form. Condensed form can use :: notation (e.g., 2001:cdba::3257:9652) and leading zeros in IPV6 hextets can also be omitted (e.g., 2001:0db8:0000:0000:0000:0000:3257:9652 → 2001:db8:0:0:0:0:3257:9652). IPV6 can also appear as a string of 32 hex characters without colons (e.g., 2001cdba000000000000000032579652). |
Example: | { "string_ip_ipv4": "192.168.0.1", "string_ip_ipv6": "2001:cdba:0000:0000:0000:0000:3257:9652" } -> IP(192.168.0.1), IP(2001:cdba:0000:0000:0000:0000:3257:9652) |
Valid JSON Bind Types: | String |
---|---|
Note: | Must be in the WKT format "LINESTRING(LONG LAT, … )" or "LINESTRING EMPTY" |
Example: | { "string_st_linestring": "LINESTRING(-71.064544 42.28787, -90.444444 89.562993)" } -> ST_LINESTRING(-71.064544 42.28787, -90.444444 89.562993) |
Valid JSON Bind Types: | Array[Array], String |
---|---|
Note: | Arrays must be 2D with the same rows and columns as specified in the table. String must be in JSON 2D array format. Each element will be cast according the the element type’s rules in this section. |
Example: | { "int_matrix": [[0, 1, 2],[3, 4, 5],[6, 7, 8]], "string_matrix": "[[0,1],[2,3]]" } -> MATRIX[[0,1,2],[3,4,5],[6,7,8]], MATRIX[[0,1],[2,3]] |
Valid JSON Bind Types: | String |
---|---|
Note: | Must be in the format "POINT(LONG LAT)" |
Example: | { "string_st_point": "POINT(-71.064544 42.28787)" } -> ST_POINT(-71.064544, 42.28787) |
Ensure that polygons are oriented in your intended way when you load them as polygon orientation matters in Ocient semantics. If you intend all polygons to be oriented counter-clockwise, use the st_forcepolygonccw transform function to enforce counter-clockwise semantics.
Valid JSON Bind Types: | String |
---|---|
Note: | Must be in the WKT format "POLYGON((LONG LAT, … ))" or "POLYGON((LONG LAT, … ), (LONG LAT, … ))" or "POLYGON EMPTY" |
Example: | { "string_st_polygon": "POLYGON((-87.62 41.87, -89.40 43.07, -87.906 43.04, -87.62 41.87))" } -> ST_POLYGON((-87.62 41.87, -89.40 43.07, -87.906 43.04, -87.62 41.87)) |
Valid JSON Bind Types: | Number, String |
---|---|
Note: | Number and String can be either integer or floating point values. Precision and scale are inherited from the table definition. |
Example: | { "number_float": 1.0, "string_float": "1" } -> 1.0, 1.0 |
Valid JSON Bind Types: | Number, String |
---|---|
Note: | Number and String can be either integer or floating point values and are converted to 16-bit signed integers. Decimals are dropped. '+','-' can be used at the beginning of a string to indicate sign. String overflows result in errors, while numeric overflows are silent. |
Example: | { "number_smallint": 1.0, "string_smallint": "1" } -> 1, 1 |
Valid JSON Bind Types: | Number, String |
---|---|
Note: | String must be in the format: HH:MM:SS[.SSSSSSSSS]. Number must represent nanoseconds since midnight. Use the TO_TIME transformation function to convert an arbitrary string format to nanoseconds and bind. |
Example: | { "number_time": 1000000000, "string_time": "07:33:10.0001" } -> TIME(00:00:01.0000), TIME(07:33:10.0001) |
Valid JSON Bind Types: | Number, String |
---|---|
Note: | String must be in the format: YYYY-MM-DD HH:MM:SS[.SSSSSSSSS]. Number must represent nanoseconds since epoch. Use the TO_TIMESTAMP transformation function to convert an arbitrary string format to nanoseconds and bind. |
Example: | { "number_timestamp": 1000000000, "string_timestamp": "2021-11-04 07:33:10.0001" } -> TIMESTAMP(1970-01-01 00:00:01.0000), TIMESTAMP(2021-11-04 07:33:10.0001) |
Valid JSON Bind Types: | Number, String |
---|---|
Note: | Number and String can be either integer or floating point values and are converted to 8-bit signed integers. Decimals are dropped. '+','-' can be used at the beginning of a string to indicate sign. String overflows result in errors, while numeric overflows are silent. |
Example: | { "number_smallint": 1.0, "string_smallint": "1" } -> 1, 1 |
Valid JSON Bind Types: | Array, String |
---|---|
Note: | String must be in JSON array format "[e0,e1,e2]". Each element will be cast to the tuple element type at that element’s index from the Ocient table according to the rules for that element type in this section. |
Example: | { "double_int": [1.0, 1], "string_array": ["Ocient", [0, 1]], "string_tuple": "[1.0, 1]", "tuple_tuple": [[1.0, 1],["Ocient", 2]] } -> TUPLE(DOUBLE, INT)(1.0, 1), TUPLE(STRING, INT[])("Ocient", INT[0, 1]), TUPLE(DOUBLE, INT)(1.0, 1), TUPLE(TUPLE(DOUBLE, INT), TUPLE(VARCHAR, INT)((1.0, 1), ("Ocient", 1)) |
Valid JSON Bind Types: | Array, String |
---|---|
Note: | String must be in JSON array format "[e0,e1,e2]". Each element will be cast to the array element type from the Ocient table according to the rules for that element type in this section. |
Example: | { "int_array": [0,1,2], "2d_array": [[0, 1, 2],[3, 4]], "string_array": "[0,1,2]", "tuple_array": [[0, 1.0, "Ocient"],[1, 2.0, "rocks"]] } -> INT[0,1,2], INT[][INT[0,1,2],INT[3,4]], INT[0,1,2] TUPLE(INT, FLOAT, VARCHAR)[(0, 1.0, "Ocient"), (1, 2.0. "rocks")] |
Valid JSON Bind Types: | String |
---|---|
Note: | Source data must be a valid hyphen-separated UUID. |
Example: | { "string_uuid": "40e6215d-b5c6-4896-987c-f30f3678f608" } -> UUID(40e6215d-b5c6-4896-987c-f30f3678f608) |
Valid JSON Bind Types: | String, Number, Boolean |
---|---|
Note: | Number and Boolean types are converted to their string representations. |
Example: | { "number_string": 1000, "boolean_string": true, "string_string": "string" } -> "1000", "true", "string" |
Geospatial data can be loaded from a variety of formats into Ocient. The Ocient system requires special considerations when you load geospatial data into Ocient including:
- Coordinate System
- Supported Geospatial Loading Formats
- Geospatial Type Conversion
- POINT Normalization
- POLYGON Normalization
Ocient uses Spatial Reference System ID (SRID) 4326 for all internal geospatial data types. SRID 4326 is defined in the WGS84 Standard. Before you load the source data, all data should have the SRID 4326 data type. The loading operation ignores any source data that contains the coordinate system SRID information.
The primary data format for loading geospatial data in Ocient is the Well-Known Text (WKT) format. This string format is a portable format that can represent points, lines, and polygons.
String data in WKT format can automatically load into POINT, LINESTRING, and POLYGON Ocient column types. In addition, arrays of strings in WKT format can automatically load into array columns of a suitable geospatial data type. The Ocient System assumes raw strings are in WKT format when the system loads these strings into geospatial column types.
Each LINESTRING or POLYGON value can be up to a maximum of 512 MB in size. This means a LINESTRING or POLYGON can contain approximately 32 million point values.
- WKT strings are whitespace insensitive, so POINT(10 20) is equivalent to POINT (10 20).
- WKT strings are case insensitive.
- Extended Well-Known Text (EWKT) is not supported. To load EWKT data, load the substring after the EWKT semicolon as the WKT format.
The other data formats that can load into Ocient are Well-Known Binary (WKB) and Extended Well-Known Binary (EWKB) formats. The Ocient System requires an explicit transformation function for both WKB and EWKB formats when you load data and convert it into the target column type.
You can use the function st_geomfromewkb to transform WKB and EWKB data.
In both cases, the Ocient System represents the WKB or EWKB string data as hexadecimal string data. The hexadecimal string can be one of these formats:
- Only hexadecimal digits: 000000000140000000000000004010000000000000
- Prefixed with 0x: 0x000000000140000000000000004010000000000000
- Prefixed with \x: \x000000000140000000000000004010000000000000
Ocient provides the automatic conversion of geometries from simpler types to more complex types. The conversion allows the load of a LINESTRING column with a POINT, or a POLYGON column with a POINT or LINESTRING. This capability also allows the load of arrays of POLYGON data from the source arrays of a combination of POINT, LINESTRING, and POLYGON data, which represents a geometry collection.
During loading, Ocient automatically performs normalization of POINT data into a regular format used within Ocient. The Ocient System performs the following operations on POINT data during the load:
- Constrain longitude to [-180, 180) and latitude to [-90, 90]; wrap around invalid coordinates using correct geographical handling.
- Snap points near the pole to the pole.
- Set longitude of points on the pole to 0
- Remove signed zeros from coordinates, so -0 becomes 0.
Ocient follows a counterclockwise rotation convention to indicate the outer ring of a POLYGON. The inner ring follows a clockwise rotation. If the Ocient System loads a POLYGON with a clockwise outer ring, the system indicates that the POLYGON is outside of the given ring.
In some cases, the POLYGON load can lead to unexpected results when the source data follows a different polygon rotation convention.
To account for this, the LAT provides the st_forcepolygonccw function that forces a counterclockwise rotation of the outer polygon ring and a clockwise rotation of the inner polygon ring when applicable.
The LAT has a select set of transformation functions to construct geospatial types and manipulate them during the load. You can find the supported geospatial transformation functions in Transformation Functions.
Load Data