Transformation Functions
LAT supports many transformation functions to modify data during loading. The functions include standard JMESPath Functions and -specific transformation functions.
Transformation functions in LAT are case sensitive. The standard syntax for transformation function calls is all lower case characters.
The following standard JMESPath Functions are supported in LAT Transformation expressions. For details about using these functions, see JMESPath Functions.
JMESPath Function | Function Signature | Example |
---|---|---|
abs | number abs(number $value) | abs(num_1) |
avg | number avg(array[number] $elements) | avg(array_1) |
contains | boolean contains(array|string $subject, any $search) | contains(array_1, 'test') |
ceil | number ceil(number $value) | ceil(num_1) |
ends_with | boolean ends_with(string $subject, string $prefix) | ends_with(str_1, 'ing') |
floor | number floor(number $value) | floor(num_1) |
join | string join(string $glue, array[string] $stringsarray) | join(',', array_1) |
keys | array keys(object $obj) | keys(obj_1) |
length | number length(string|array|object $subject) | length(array_1) |
map | array[any] map(expression→any→any expr, array[any] elements) | map(&to_string(@), array_1) |
max | number max(array[number]|array[string] $collection) | max(array_1) |
max_by | max_by(array elements, expression→number|expression→string expr) | max_by(array_1, &age) |
merge | object merge([object *argument, [, object $…]]) | merge(obj_1, obj_2, obj_3) |
min | number min(array[number]|array[string] $collection) | min(array_1) |
min_by | min_by(array elements, expression→number|expression→string expr) | min_by(array_1, &age) |
not_null | any not_null([any $argument [, any $…]]) | not_null(val_1, val_2, val_3) |
reverse | array reverse(string|array $argument) | reverse(str_1) |
sort | array sort(array[number]|array[string] $list) | sort(array_1) |
sort_by | sort_by(array elements, expression→number|expression→string expr) | sort_by(array_1, &age) |
starts_with | boolean starts_with(string $subject, string $prefix) | starts_with(str_1, 'chi') |
sum | number sum(array[number] $collection) | sum(array_1) |
to_array | array to_array(any $arg) | to_array(val_1) |
to_string | string to_string(any $arg) | to_string(val_1) |
type | string type(array|object|string|number|boolean|null $subject) | type(val_1) |
values | array values(object $obj) | values(obj_1) |
The following transformation functions are also available in LAT transformation expressions.
Returns the sum of the two arguments.
Signature: | number add(number $a, number $b) |
---|---|
Arguments: | - a: first argument - b: second argument |
Example: | add(int_col_0, int_col_1) |
Caps the length of an array to a maximum number of elements.
Signature: | array array_cap(Array[any] $array_1, number $max_elements) |
---|---|
Arguments: | - array_1: the array to cap - max_elements: number of elements to cap the array at |
Example: | array_cap(array_1, `1000`) |
Concatenate two arrays, returning an array. Accepts an array or null value, returning an empty array if both parameters are null.
Signature: | array array_cap(array[any] $array_1, array[any] $array_2) |
---|---|
Arguments: | - array_1: first array - array_2: second array concatenated to first |
Example: | array_cat(array_1, array_2) |
Concatenates two strings or converts arguments into strings and concatenates them into a string.
Signature: | string concat(any $a, any $b) |
---|---|
Arguments: | - a: first argument (automatically converted to string when not specified as a string) - b: second argument (automatically converted to string when not specified as a string) |
Example: | concat("ocient", `10`) |
Returns the floating-point division of the two arguments.
Signature: | number divide(number $a, number $b) |
---|---|
Arguments: | - a: first argument - b: second argument |
Example: | divide(int_col_0, int_col_1) |
Returns an integer hash code for a string, number, or Boolean.
Signature: | number hash_code(string|number|boolean $val) |
---|---|
Arguments: | val: the value to hash |
Example: | hash_code(null) → null hash_code("Chicago") → -1884315574 hash_code("test123") → -1422501792 hash_code(`12345`) -> 12345 hash_code(`123.45`) -> -1936584703 hash_code(`true`) -> 3 hash_code(`false`) -> 1 |
Returns one of the other arguments based on the truth value of the first. When the first argument is true, return the second argument. When false, return the third.
A NULL input into the first argument will be evaluated as false. Both branches are eagerly evaluated.
Signature: | any if(boolean $b, any $if_true, any $if_false) |
---|---|
Arguments: | - b: boolean value that determines which of the options to return - if_true: value returned if b is true - if_false: value returned if b is false |
Example: | if(value > `5`, "North America", "Europe") |
Parses a string into a JSON value for use in further JMESPath operations. All values other than string are returned unmodified. Strings containing null, boolean, number, array, and object types are supported.
Signature: | object json(string $val) |
---|---|
Arguments: | val: string to parse |
Example: | json("{ \"key\": 10 }") |
Returns one of the other arguments based on the truth value of the first. When the first argument is true, return the second argument. When false, return the third. The second and third arguments are both expressions. Branches are lazily evaluated.
Signature: | any lazy_if(boolean $b, expression $if_true, expression $if_false, any data) |
---|---|
Arguments: | - b: boolean value that determines which of the options to return - if_true: expression evaluated and returned if true - if_false: expression evaluated and returned if false - data: information used to evaluate either expression |
Example: | if(value > 5, &"North America", &abs(@), 1.0) |
Returns the num_chars leftmost characters of the string column. If null, null is returned. If the index is greater than the length of the string, the entire string is returned.
Signature: | string left(string $val, number $num_chars) |
---|---|
Arguments: | - a: string column - num_chars: number of chars to use from the left of a string column. |
Example: | left("ocient", `3`) |
Returns lower case version of a string.
Signature: | string lower_case(string $a) |
---|---|
Arguments: | a: string to lower case |
Example: | lower_case("OCIENT") |
Pads an input string to a specified length with a padding string added to the left side.
Signature: | string lpad(string $val, number $length, string $padding) |
---|---|
Arguments: | - val: the input string. This cannot be null. - length: the length you want after padding. If the input string is longer than this length, it is truncated to length characters - padding: the padding string. If no padding string is provided, the input string is padded with spaces. |
Example: | lpad(string_1, `3`, '_') |
Removes leading contiguous instances of a set of characters in a given string.
Signature: | string ltrim(string $val, string $strip_chars) |
---|---|
Arguments: | - val: the input string to trim - strip_chars: set of characters that should be removed (default is ' ' if the value is null or not specified) |
Example: | ltrim(null, ' ') → null ltrim(' Chicago ') → 'Chicago ' ltrim(' Chicago ', ' ') → 'Chicago ' ltrim(' Chicago ', null) → 'Chicago ' ltrim('123test123', '321') → 'test123' |
An override of JmesPath’s built in map which maps a given expression into elements of an array. This override makes the small adjustment that if the array itself is null, then a null is returned.
Signature: | array map(expression $expr, array[any] array_1) |
---|---|
Arguments: | - expr: expression the expression to map - array_1: array[any] the array to which the expression will be mapped |
Example: | map(&if(@ == 1, true, false), [1,2,1,1,2]) → [true, false, true, true, false] |
Returns Boolean for whether this input matches the pattern.
Flags can be:
- s → Pattern.DOTALL
- m → Pattern.MULTILINE
- i → Pattern.CASE_INSENSITIVE
- x → Pattern.COMMENTS
- q → Pattern.LITERAL
Signature: | boolean matches(string $val, string $pattern, string $flags) |
---|---|
Arguments: | - val: value to match - pattern: regex to match against - args: regex flags |
Example: | matches(col_1, ".*mytest.*", 'i') |
Converts an integer number of milliseconds since the Epoch to an Ocient timestamp column.
Signature: | number millis_to_timestamp(number $t) |
---|---|
Arguments: | t: value to convert to an Ocient timestamp |
Example: | millis_to_timestamp(my_millis) |
Returns the floating-point multiplication of the two arguments.
Signature: | number multiply(number $a, number $b) |
---|---|
Arguments: | - a: first argument - b: second argument |
Example: | multiply(int_col_0, int_col_1) |
Converts an integer number of nanoseconds since the Epoch to an Ocient timestamp column.
Signature: | number nanos_to_timestamp(number $t) |
---|---|
Arguments: | - t: value to convert to an Ocient timestamp |
Example: | nanos_to_timestamp(my_nanosecs) |
Removes leading and trailing whitespace and replaces connected whitespace with a single space.
Signature: | string normalize_space(string $val) |
---|---|
Arguments: | val: string to normalize |
Example: | normalize_space(col_1) |
Returns the current timestamp in nanos. This will correctly load into an Ocient timestamp column
Signature: | number now() | number current_timestamp() |
---|---|
Arguments: | None |
Example: | now(), current_timestamp() |
Returns null if the first argument matches any of the following arguments. Can be applied to multiple columns or to compare a column with a constant. Can also be used with more than two arguments in which case returns null if a matches any of the subsequent arguments. Returns the value the first argument if it does not match any other arguments.
Signature: | any null_if(any $val, any $c0, [, any $…]) |
---|---|
Arguments: | - val: value to compare - c0: required value to compare against - … : optional multiple values to compare against |
Example: | null_if(col_a, "NULL", "NONE") |
Given a string, parses it into an array of strings. Handles nested arrays as well.
Signature: | array parse_array(string $array_string, string $bracket_char, string $delim_char, int $dimensions) |
---|---|
Arguments: | - array_string: the string of the array to be parsed - bracket_char: the opening bracket character to use - delim_char: the delimiter character to use - dimensions: the dimension of this array |
Example: | parse_array("[ 1, 2, 3]", "[", ",", `1`) ⇒ [1, 2, 3] parse_array("[[1,2],[3], []]", '[', ',', `2`) ⇒ [[1,2], [3], []] parse_array("{ \"cat\", \"hat\" }", "{", ",", `1`) ⇒ ["cat", "hat"] |
Returns string that represents a unique identifier for the record in a given pipeline for a given file_group or topic. This return value will be consistent over multiple calls in the pipeline's transformation configuration for the same source record. Can be used as a surrogate key when loading multiple tables from the same source record in a pipeline to facilitate joins when no unique identifier exists on the source record.
Signature: | string record_uuid() |
---|---|
Arguments: | None |
Example: | record_uuid() -> "6064fc40-9961-4f5e-b74d-090458e4a609" |
Returns string with text matching the regex pattern replaced by the replacement text.
Flags can be:
- s → Pattern.DOTALL
- m → Pattern.MULTILINE
- i → Pattern.CASE_INSENSITIVE
- x → Pattern.COMMENTS
- q → Pattern.LITERAL
Signature: | string replace(string $val, string $pattern, string $replacement_text, string $flags) |
---|---|
Arguments: | - val: value to search for replacement - pattern: section to replace - replacement_text: replacement string - args: regex flags |
Example: | replace(col_1, '[oO]ld', 'new', 'q') |
Returns the num_chars rightmost characters of the string column. If null, null is returned. If the index is greater than the length of the string, the entire string is returned.
Signature: | string right(string $a, number $num_chars) |
---|---|
Arguments: | - a: string column - num_chars: number of chars to use from the right of a string column. |
Example: | right("ocient", `3`) |
Pads an input string to a specified length with a padding string added to the right side.
Signature: | string rpad(string $val, number $length, string $padding) |
---|---|
Arguments: | - val: the input string. This cannot be null. - length: the length you want after padding. If the input string is longer than this length, it is truncated to length characters - padding: the padding string. If no padding string is provided, the input string is padded with spaces. |
Example: | rpad(string_1, `3`, '_') |
Removes trailing contiguous instances of a set of characters in a given string.
Signature: | string rtrim(string $val, string $strip_chars) |
---|---|
Arguments: | - val: the input string to trim - strip_chars: set of characters that should be removed (default is ' ' if the value is null or not specified) |
Example: | rtrim(null, ' ') → null rtrim(' Chicago ') → ' Chicago' rtrim(' Chicago ', ' ') → ' Chicago' rtrim(' Chicago ', null) → ' Chicago' rtrim('123test123', '321') → '123test' |
Returns true or false based on the given sample rate. The sample rate must be a number. Commonly used inside of IF statements. An exception will be thrown if the sample rate is outside of [0.0, 1.0].
Signature: | boolean sample(number $rate) |
---|---|
Arguments: | rate: rate to return true. Must be between [0.0, 1.0]. |
Example: | sample(`0.5`) |
Forces a polygon to have a counterclockwise rotation of the outer polygon ring and a clockwise rotation of the inner polygon ring. You can use this function to convert polygons extracted from systems that interpret polygon rotation differently.
Signature: | geometry st_forcepolygonccw(string $wkt_geometry) |
---|---|
Arguments: | wkt_geometry: WKT string representing the polygon to rotate. Also accepts WKT of POINT or LINESTRING and returns those types unmodified. |
Example: | st_forcepolygonccw('POLYGON(…)') |
Returns a geometry from an Extended Well-Known Binary (EWKB) and Well-Known Binary (WKB) representations of a geometry. You can load the result of this function into POINT, LINESTRING, and POLYGON column types.
Signature: | geometry st_geomfromewkb(string ewkb) |
---|---|
Arguments: | wkb: hex encoded string Extended WKB or WKB representation of a geometry. EWKB or WKB can be standalone hexadecimal characters or can be prefixed with either 0x or \x. |
Example: | st_geomfromewkb('0103…') st_geomfromewkb('0x0103…') st_geomfromewkb('\x0103…') |
Constructs an POINT from numeric coordinates that represent the longitude and latitude of the point.
Signature: | geometry st_point(float longitude, float latitude) |
---|---|
Arguments: | - longitude: floating point longitude of the point to construct - latitude: floating point latitude of the point to construct |
Example: | st_point(float_col_1, float_col_2) |
Returns the substring beginning at start_index (with indexes beginning at 1) to the end of the string, or to num_chars characters.
Signature: | string substring(string $val, number $start_index, [number $num_chars]) |
---|---|
Arguments: | val: string to substring start_index: index to start at. First character is index 1. num_chars: optional. The maximum number of chars to return |
Example: | substring(str_col, `10`) substring(str_col, `10`, `5`) substring('Chicago Bears', `3`) → 'icago Bears' substring('Chicago Bears', `3`, `4`) → 'icag' |
Returns the string after the first occurrence of the needle.
Signature: | string substring_after(string $val, string $needle) |
---|---|
Arguments: | - val: string to substring - needle: delimiter to use to designate where to return subsequent substring. |
Example: | substring_after('my_text', '_') → 'text' |
Returns the string before the first occurrence of the needle. Both parameters are literal strings.
Signature: | string substring_before(string $val, string $needle) |
---|---|
Arguments: | - val: string to substring - needle: delimiter to use to designate where to return prior substring. |
Example: | substring_before('my_text', '_') → 'my' |
Returns the difference of the two arguments.
Signature: | number subtract(number $a, number $b) |
---|---|
Arguments: | - a: first argument - b: second argument |
Example: | subtract(int_col_0, int_col_1) |
Turns any JSON object into an array with N copies of that object.
Signature: | array[any] to_array_length(any $val, number $n) |
---|---|
Arguments: | - val: value to copy into an array - n: number of copies to make |
Example: | to_array_length(value_col, `3`) |
Converts a string to binary data for loading into an Ocient binary or hash column.
If mode is 'hex', data is parsed as a sequence of hexadecimal digits. Note that the sequence of digits must not begin with 0x.
Otherwise, mode must be the name of a character encodings listed here, and data is encoded in that encoding.
Signature: | binary to_binary(string $data, string $mode) |
---|---|
Arguments: | - data: data to convert to binary - mode: either 'hex' or the name of a standard Java® character encoding |
Example: | to_binary(str_col, 'utf-16') |
Converts a string to a date for loading into an Ocient date column. Format string follows these formatting rules.
Signature: | number to_date(string $date, string $format_string) |
---|---|
Arguments: | - date: date string to convert - format_string: format string to use for conversion |
Example: | to_date(str_col, 'yyyy-MM-dd') |
Converts a string into a time of day for loading into an Ocient time column. Format string follows these formatting rules.
Signature: | number to_time(string $time, string $format_string) |
---|---|
Arguments: | - time: time string to convert - format_string: format string to use for conversion |
Example: | to_time(str_col, 'HH:mm:ss.nnnnnnnnn') |
Converts a string to timestamp for loading into an Ocient timestamp column. Format string follows these formatting rules.
Signature: | number to_timestamp(string $timestamp, string $format_string) |
---|---|
Arguments: | - timestamp: timestamp string to convert - format_string: format string to use for conversion |
Example: | to_timestamp(str_col, 'yyyy-MM-dd HH:mm:ss.SSS') |
Works on tuples that get tokenized at the first nesting level by a given delimiter. Nested tuples, arrays or objects are considered entirely. For each tokenized tuple element, a transformation function can be specified that is applied to the tuple element. Tokenized elements are always treated as strings, requiring additional transformation when being treated as numbers.
Signature: | array to_tuple(string $tuple_string, string $bracket_char, string $delim_char [, expression $transformation_function …]) |
---|---|
Arguments: | - tuple_string: the string of the tuple to be parsed. Must be encapsulated in the given bracket character and its closing peer. Empty tuple values are treated as null values. If the whole argument is NULL, the function returns NULL ignoring all given expressions in transformation_function… - bracket_char: the bracket character to use for parsing and result generation - delim_char: the delimiter character to use for parsing and result generation. The supported delimiters are: '(', '[' and '{' - avg(…) - keys(…) - map(…) - sum(…) - value(…) - type(…) - partially for boolean and object types |
Example: |
|
Returns array of strings after splitting on the regex pattern.
Flags can be:
- s → Pattern.DOTALL
- m → Pattern.MULTILINE
- i → Pattern.CASE_INSENSITIVE
- x → Pattern.COMMENTS
- q → Pattern.LITERAL
Signature: | array[string] tokenize(string $val, string $pattern, string $flags) |
---|---|
Arguments: | - val: string to tokenize - pattern: token regex pattern - flags: regex flags |
Example: | tokenize(col_1, '$', 'i') |
Replaces characters according to a map of characters. If the to_chars string is shorter than the from_chars string, the function removes the characters that are not mapped.
Signature: | string translate(string $val, string $from_chars, string $to_chars) |
---|---|
Arguments: | - val: string to translate - from_char: from character map - to_chars: to character map |
Example: | translate('Lorem Ipsum', 'mnop', '1234') → 'L3re1 I4su1' |
Removes leading and trailing contiguous instances of a set of characters in a given string.
Signature: | string trim(string $val, string $strip_chars) |
---|---|
Arguments: | - val: the input string to trim - strip_chars: set of characters that should be removed (default is ' ' if the value is null or not specified) |
Example: | trim(null, ' ') → null trim(' Chicago ') → 'Chicago' trim(' Chicago ', ' ') → 'Chicago' trim(' Chicago ', null) → 'Chicago' trim('123test123', '321') → 'test' |
Truncates a decimal number to have a specified number of digits after its decimal point.
If decimal is null, this function returns null.
During the transformation stage, the LAT stores numbers as integers or floating point numbers by default. However, floating point numbers are inexact. For this reason, this function might return unexpected results. For example, truncate(`0.29`, `2`) evaluates to 0.28, not 0.29 (because 0.29 is really 0.28999999...).
Signature: | number truncate(number $decimal, number $decimal_places) |
---|---|
Arguments: | - decimal: the number to truncate - decimal_places: the number of digits to keep after the decimal point |
Example: | truncate(decimal_col, `2`) |
Works on an element of a given array and applies the specified expression to that element. The function returns an array or null if specified.
Signature: | array tuple_element_transformation(array $array_values, number $array_index, expression $transformation_function) |
---|---|
Arguments: | - array_values: an array of values. If the whole argument is NULL, the function returns NULL. - array_index: a number pointing to an element in the array. The first index of the array is 0 - transformation_function: a transformation functions to apply to array element specified by array_index. |
Example: | tuple_element_transformation([-1.0,2,3, '0', abs(@)) → [1.0,2,3] ] |
Return all unique values in an array.
Signature: | array[any] unique(array[any] $vals) |
---|---|
Arguments: | vals: array to return unique values from, removing all duplicates. |
Example: | unique(vals_col) |
Returns upper case version of string.
Signature: | string upper_case(string $a) |
---|---|
Arguments: | a: string to upper case |
Example: | upper_case("ocient") |
Returns the bucket number of the queried value in a histogram starting at min, ending at max, and consisting of num_buckets count of buckets.
- The values are inclusive on the lower bound and exclusive of the upper bound so that all buckets are the same width.
- Also note that requesting 100 buckets will actually get you 102 possible buckets. Bucket 0 through 101. Where 0 captures all values below the minimum and 101 captures all values at the maximum and beyond.
Signature: | number width_bucket(number $val, number $min, number max, number $num_buckets) |
---|---|
Arguments: | - val: value to determine the bucket number for - min: min value of histogram - max: max value of histogram - num_buckets: number of buckets in histogram |
Example: | width_bucket(my_col, `0`, `100`, `20`) |
Zips N 1D arrays into a single 2D array with each inner array containing N elements.
The resulting outer array will have a length equal to the max length of any input array. For input arrays smaller than the max length, null elements will be placed in the inner arrays.
Signature: | array[any] zip(array[any] $a0, array[any] $a1, [, array[any] $…]) |
---|---|
Arguments: | - a0: array to zip - a1: array to zip - … : Optional additional arrays to zip |
Example: | zip(array_col0, array_col1, array_col2) |
Performs an explosion on all indicated columns, producing one row per value in the exploded array. This can be applied to multiple columns in the same table. Non-exploded columns are held constant and exploded columns are zipped together in array order. If multiple exploded columns have different array lengths, the record is exploded into N records where N is the max length. Columns with array lengths less than N will have null values after each of their values are exploded.
You can explode a maximum of 8,192 array elements for each source record.
EXPLODE(array[any] arr) can be used like any other JMESPath function on a JSON Array type with a few limitations:
- EXPLODE can only be used as the outermost function in a JMESPath expression
- EXPLODE only works on a single dimension. Multidimensional explode is not supported.
Single-Column EXPLODE
Transform Configuration:
Input:
Output:
Multi-Column EXPLODE
Transform Configuration:
Input:
Output:
Note that in the 4th record col1 has a null value because array1 only has 3 values while array0 has 4.
Load Data