LAT supports many transformation functions to modify data during loading. The functions include standard JMESPath Functions and Ocient-specific transformation functions.Documentation Index
Fetch the complete documentation index at: https://docs.ocient.com/llms.txt
Use this file to discover all available pages before exploring further.
Transformation functions in LAT are case sensitive. The standard syntax for transformation function calls is all lower case characters.
Standard JMESPath Functions
The following standard JMESPath Functions are supported in LAT Transformation expressions. For details about using these functions, see JMESPath Functions.| JMESPath Function | Function Signature | Example |
|---|---|---|
| abs | number abs(number $value) | abs(num_1) |
| avg | number avg(array[number] $elements) | avg(array_1) |
| contains | boolean contains(array|string search) | contains(array_1, ‘test’) |
| ceil | number ceil(number $value) | ceil(num_1) |
| ends_with | boolean ends_with(string prefix) | ends_with(str_1, ‘ing’) |
| floor | number floor(number $value) | floor(num_1) |
| join | string join(string stringsarray) | join(’,’, array_1) |
| keys | array keys(object $obj) | keys(obj_1) |
| length | number length(string|array|object $subject) | length(array_1) |
| map | array[any] map(expression→any→any expr, array[any] elements) | map(&to_string(@), array_1) |
| max | number max(array[number]|array[string] $collection) | max(array_1) |
| max_by | max_by(array elements, expression→number|expression→string expr) | max_by(array_1, &age) |
| merge | object merge([object *argument, [, object $…]]) | merge(obj_1, obj_2, obj_3) |
| min | number min(array[number]|array[string] $collection) | min(array_1) |
| min_by | min_by(array elements, expression→number|expression→string expr) | min_by(array_1, &age) |
| not_null | any not_null([any …]]) | not_null(val_1, val_2, val_3) |
| reverse | array reverse(string|array $argument) | reverse(str_1) |
| sort | array sort(array[number]|array[string] $list) | sort(array_1) |
| sort_by | sort_by(array elements, expression→number|expression→string expr) | sort_by(array_1, &age) |
| starts_with | boolean starts_with(string prefix) | starts_with(str_1, ‘chi’) |
| sum | number sum(array[number] $collection) | sum(array_1) |
| to_array | array to_array(any $arg) | to_array(val_1) |
| to_string | string to_string(any $arg) | to_string(val_1) |
| type | string type(array|object|string|number|boolean|null $subject) | type(val_1) |
| values | array values(object $obj) | values(obj_1) |
Other Transformation Functions
The following transformation functions are also available in LAT transformation expressions.add
Returns the sum of the two arguments.
| Signature: | number add(number $a, number $b) |
|---|---|
| Arguments: | - a: first argument- b: second argument |
| Example: | add(int_col_0, int_col_1) |
array_cap
Caps the length of an array to a maximum number of elements.
| Signature: | array array_cap(Array[any] $array_1, number $max_elements) |
|---|---|
| Arguments: | - array_1: the array to cap- max_elements: number of elements to cap the array at |
| Example: | array_cap(array_1, `1000`) |
array_cat
Concatenate two arrays, returning an array. Accepts an array or null value, returning an empty array if both parameters are null.
| Signature: | array array_cap(array[any] $array_1, array[any] $array_2) |
|---|---|
| Arguments: | - array_1: first array- array_2: second array concatenated to first |
| Example: | array_cat(array_1, array_2) |
concat
Concatenates two strings or converts arguments into strings and concatenates them into a string.
| Signature: | string concat(any $a, any $b) |
|---|---|
| Arguments: | - a: first argument (automatically converted to string when not specified as a string)- b: second argument (automatically converted to string when not specified as a string) |
| Example: | concat("ocient", `10`) |
divide
Returns the floating-point division of the two arguments.
| Signature: | number divide(number $a, number $b) |
|---|---|
| Arguments: | - a: first argument- b: second argument |
| Example: | divide(int_col_0, int_col_1) |
hash_code
Returns an integer hash code for a string, number, or Boolean.
| Signature: | number hash_code(string|number|boolean $val) |
|---|---|
| Arguments: | val: the value to hash |
| Example: | hash_code(null) → nullhash_code("Chicago") → -1884315574hash_code("test123") → -1422501792hash_code(`12345`) -> 12345@@HTML_TABLE_TOKEN_3@@hash_code(`123.45`) -> -1936584703@@HTML_TABLE_TOKEN_4@@hash_code(`true`) -> 3@@HTML_TABLE_TOKEN_5@@hash_code(`false`) -> 1 |
if
Returns one of the other arguments based on the truth value of the first. When the first argument is true, return the second argument. When false, return the third.
A NULL input into the first argument will be evaluated as false. Both branches are eagerly evaluated.
| Signature: | any if(boolean $b, any $if_true, any $if_false) |
|---|---|
| Arguments: | - b: boolean value that determines which of the options to return- if_true: value returned if b is true- if_false: value returned if b is false |
| Example: | if(value > `5`, "North America", "Europe") |
json
Parses a string into a JSON value for use in further JMESPath operations. All values other than string are returned unmodified. Strings containing null, boolean, number, array, and object types are supported.
| Signature: | object json(string $val) |
|---|---|
| Arguments: | val: string to parse |
| Example: | json("{ \"key\": 10 }") |
lazy_if
Returns one of the other arguments based on the truth value of the first. When the first argument is true, return the second argument. When false, return the third. The second and third arguments are both expressions. Branches are lazily evaluated.
| Signature: | any lazy_if(boolean $b, expression $if_true, expression $if_false, any data) |
|---|---|
| Arguments: | - b: boolean value that determines which of the options to return- if_true: expression evaluated and returned if true- if_false: expression evaluated and returned if false- data: information used to evaluate either expression |
| Example: | if(value > 5, &"North America", &abs(@), 1.0) |
left
Returns the num_chars leftmost characters of the string column. If null, null is returned. If the index is greater than the length of the string, the entire string is returned.
| Signature: | string left(string $val, number $num_chars) |
|---|---|
| Arguments: | - a: string column- num_chars: number of chars to use from the left of a string column. |
| Example: | left("ocient", `3`) |
lower_case
Returns lower case version of a string.
| Signature: | string lower_case(string $a) |
|---|---|
| Arguments: | a: string to lower case |
| Example: | lower_case("OCIENT") |
lpad
Pads an input string to a specified length with a padding string added to the left side.
| Signature: | string lpad(string $val, number $length, string $padding) |
|---|---|
| Arguments: | - val: the input string. This cannot be null.- length: the length you want after padding. If the input string is longer than this length, it is truncated to length characters- padding: the padding string. If no padding string is provided, the input string is padded with spaces. |
| Example: | lpad(string_1, `3`, '_') |
ltrim
Removes leading contiguous instances of a set of characters in a given string.
| Signature: | string ltrim(string $val, string $strip_chars) |
|---|---|
| Arguments: | - val: the input string to trim- strip_chars: set of characters that should be removed (default is ’ ’ if the value is null or not specified) |
| Example: | ltrim(null, ' ') → nullltrim(' Chicago ') → 'Chicago 'ltrim(' Chicago ', ' ') → 'Chicago 'ltrim(' Chicago ', null) → 'Chicago 'ltrim('123test123', '321') → 'test123' |
map
An override of JmesPath’s built in map which maps a given expression into elements of an array. This override makes the small adjustment that if the array itself is null, then a null is returned.
| Signature: | array map(expression $expr, array[any] array_1) |
|---|---|
| Arguments: | - expr: expression the expression to map- array_1: array[any] the array to which the expression will be mapped |
| Example: | map(&if(@ == 1, true, false), [1,2,1,1,2]) → [true, false, true, true, false] |
matches
Returns Boolean for whether this input matches the pattern.
Flags can be:
s→ Pattern.DOTALLm→ Pattern.MULTILINEi→ Pattern.CASE_INSENSITIVEx→ Pattern.COMMENTSq→ Pattern.LITERAL
| Signature: | boolean matches(string $val, string $pattern, string $flags) |
|---|---|
| Arguments: | - val: value to match- pattern: regex to match against- args: regex flags |
| Example: | matches(col_1, ".*mytest.*", 'i') |
millis_to_timestamp
Converts an integer number of milliseconds since the Epoch to an Ocient timestamp column.
| Signature: | number millis_to_timestamp(number $t) |
|---|---|
| Arguments: | t: value to convert to an Ocient timestamp |
| Example: | millis_to_timestamp(my_millis) |
multiply
Returns the floating-point multiplication of the two arguments.
| Signature: | number multiply(number $a, number $b) |
|---|---|
| Arguments: | - a: first argument- b: second argument |
| Example: | multiply(int_col_0, int_col_1) |
nanos_to_timestamp
Converts an integer number of nanoseconds since the Epoch to an Ocient timestamp column.
| Signature: | number nanos_to_timestamp(number $t) |
|---|---|
| Arguments: | - t: value to convert to an Ocient timestamp |
| Example: | nanos_to_timestamp(my_nanosecs) |
normalize_space
Removes leading and trailing whitespace and replaces connected whitespace with a single space.
| Signature: | string normalize_space(string $val) |
|---|---|
| Arguments: | val: string to normalize |
| Example: | normalize_space(col_1) |
now, current_timestamp
Returns the current timestamp in nanos. This will correctly load into an Ocient timestamp column
| Signature: | number now() | number current_timestamp() |
|---|---|
| Arguments: | None |
| Example: | now(), current_timestamp() |
null_if, nullif
Returns null if the first argument matches any of the following arguments. Can be applied to multiple columns or to compare a column with a constant. Can also be used with more than two arguments in which case returns null if a matches any of the subsequent arguments. Returns the value the first argument if it does not match any other arguments.
| Signature: | any null_if(any $val, any $c0, [, any $…]) |
|---|---|
| Arguments: | - val: value to compare- c0: required value to compare against- … : optional multiple values to compare against |
| Example: | null_if(col_a, "NULL", "NONE") |
parse_array
Given a string, parses it into an array of strings. Handles nested arrays as well.
| Signature: | array parse_array(string $array_string, string $bracket_char, string $delim_char, int $dimensions) |
|---|---|
| Arguments: | - array_string: the string of the array to be parsed- bracket_char: the opening bracket character to use- delim_char: the delimiter character to use- dimensions: the dimension of this array |
| Example: | parse_array("[ 1, 2, 3]", "[", ",", `1`) ⇒ [1, 2, 3]@@HTML_TABLE_TOKEN_0@@parse_array("[[1,2],[3], []]", '[', ',', `2`) ⇒ [[1,2], [3], []]@@HTML_TABLE_TOKEN_1@@parse_array("{ \"cat\", \"hat\" }", "{", ",", `1`) ⇒ ["cat", "hat"] |
record_uuid
Returns string that represents a unique identifier for the record in a given pipeline for a given file_group or topic. This return value will be consistent over multiple calls in the pipeline’s transformation configuration for the same source record. Can be used as a surrogate key when loading multiple tables from the same source record in a pipeline to facilitate joins when no unique identifier exists on the source record.
| Signature: | string record_uuid() |
|---|---|
| Arguments: | None |
| Example: | record_uuid() -> "6064fc40-9961-4f5e-b74d-090458e4a609" |
replace
Returns string with text matching the regex pattern replaced by the replacement text.
Flags can be:
s→ Pattern.DOTALLm→ Pattern.MULTILINEi→ Pattern.CASE_INSENSITIVEx→ Pattern.COMMENTSq→ Pattern.LITERAL
| Signature: | string replace(string $val, string $pattern, string $replacement_text, string $flags) |
|---|---|
| Arguments: | - val: value to search for replacement- pattern: section to replace- replacement_text: replacement string- args: regex flags |
| Example: | replace(col_1, '[oO]ld', 'new', 'q') |
right
Returns the num_chars rightmost characters of the string column. If null, null is returned. If the index is greater than the length of the string, the entire string is returned.
| Signature: | string right(string $a, number $num_chars) |
|---|---|
| Arguments: | - a: string column- num_chars: number of chars to use from the right of a string column. |
| Example: | right("ocient", `3`) |
rpad
Pads an input string to a specified length with a padding string added to the right side.
| Signature: | string rpad(string $val, number $length, string $padding) |
|---|---|
| Arguments: | - val: the input string. This cannot be null.- length: the length you want after padding. If the input string is longer than this length, it is truncated to length characters- padding: the padding string. If no padding string is provided, the input string is padded with spaces. |
| Example: | rpad(string_1, `3`, '_') |
rtrim
Removes trailing contiguous instances of a set of characters in a given string.
| Signature: | string rtrim(string $val, string $strip_chars) |
|---|---|
| Arguments: | - val: the input string to trim- strip_chars: set of characters that should be removed (default is ’ ’ if the value is null or not specified) |
| Example: | rtrim(null, ' ') → nullrtrim(' Chicago ') → ' Chicago'rtrim(' Chicago ', ' ') → ' Chicago'rtrim(' Chicago ', null) → ' Chicago'rtrim('123test123', '321') → '123test' |
sample
Returns true or false based on the given sample rate. The sample rate must be a number. Commonly used inside of IF statements. An exception will be thrown if the sample rate is outside of [0.0, 1.0].
| Signature: | boolean sample(number $rate) |
|---|---|
| Arguments: | rate: rate to return true. Must be between [0.0, 1.0]. |
| Example: | sample(`0.5`) |
st_forcepolygonccw
Forces a polygon to have a counterclockwise rotation of the outer polygon ring and a clockwise rotation of the inner polygon ring. You can use this function to convert polygons extracted from systems that interpret polygon rotation differently.
| Signature: | geometry st_forcepolygonccw(string $wkt_geometry) |
|---|---|
| Arguments: | wkt_geometry: WKT string representing the polygon to rotate. Also accepts WKT of POINT or LINESTRING and returns those types unmodified. |
| Example: | st_forcepolygonccw('POLYGON(…)') |
st_geomfromewkb
Returns a geometry from an Extended Well-Known Binary (EWKB) and Well-Known Binary (WKB) representations of a geometry. You can load the result of this function into POINT, LINESTRING, and POLYGON column types.
| Signature: | geometry st_geomfromewkb(string ewkb) |
|---|---|
| Arguments: | wkb: hex encoded string Extended WKB or WKB representation of a geometry. EWKB or WKB can be standalone hexadecimal characters or can be prefixed with either 0x or \x. |
| Example: | st_geomfromewkb('0103…')st_geomfromewkb('0x0103…')st_geomfromewkb('\x0103…') |
st_point
Constructs an POINT from numeric coordinates that represent the longitude and latitude of the point.
| Signature: | geometry st_point(float longitude, float latitude) |
|---|---|
| Arguments: | - longitude: floating point longitude of the point to construct- latitude: floating point latitude of the point to construct |
| Example: | st_point(float_col_1, float_col_2) |
substring
Returns the substring beginning at start_index (with indexes beginning at 1) to the end of the string, or to num_chars characters.
| Signature: | string substring(string $val, number $start_index, [number $num_chars]) |
|---|---|
| Arguments: | val: string to substringstart_index: index to start at. First character is index 1.num_chars: optional. The maximum number of chars to return |
| Example: | substring(str_col, `10`)@@HTML_TABLE_TOKEN_0@@substring(str_col, `10`, `5`)@@HTML_TABLE_TOKEN_1@@substring('Chicago Bears', `3`) → 'icago Bears'@@HTML_TABLE_TOKEN_2@@substring('Chicago Bears', `3`, `4`) → 'icag' |
substring_after
Returns the string after the first occurrence of the needle.
| Signature: | string substring_after(string $val, string $needle) |
|---|---|
| Arguments: | - val: string to substring- needle: delimiter to use to designate where to return subsequent substring. |
| Example: | substring_after('my_text', '_') → 'text' |
substring_before
Returns the string before the first occurrence of the needle. Both parameters are literal strings.
| Signature: | string substring_before(string $val, string $needle) |
|---|---|
| Arguments: | - val: string to substring- needle: delimiter to use to designate where to return prior substring. |
| Example: | substring_before('my_text', '_') → 'my' |
subtract
Returns the difference of the two arguments.
| Signature: | number subtract(number $a, number $b) |
|---|---|
| Arguments: | - a: first argument- b: second argument |
| Example: | subtract(int_col_0, int_col_1) |
to_array_length
Turns any JSON object into an array with N copies of that object.
| Signature: | array[any] to_array_length(any $val, number $n) |
|---|---|
| Arguments: | - val: value to copy into an array- n: number of copies to make |
| Example: | to_array_length(value_col, `3`) |
to_binary
Converts a string to binary data for loading into an Ocient binary or hash column.
If mode is 'hex', data is parsed as a sequence of hexadecimal digits. Note that the sequence of digits must not begin with 0x.
Otherwise, mode must be the name of a character encodings listed here, and data is encoded in that encoding.
| Signature: | binary to_binary(string $data, string $mode) |
|---|---|
| Arguments: | - data: data to convert to binary- mode: either ‘hex’ or the name of a standard character encoding |
| Example: | to_binary(str_col, 'utf-16') |
to_date
Converts a string to a date for loading into an Ocient date column. Format string follows these formatting rules.
| Signature: | number to_date(string $date, string $format_string) |
|---|---|
| Arguments: | - date: date string to convert- format_string: format string to use for conversion |
| Example: | to_date(str_col, 'yyyy-MM-dd') |
to_time
Converts a string into a time of day for loading into an Ocient time column. Format string follows these formatting rules.
| Signature: | number to_time(string $time, string $format_string) |
|---|---|
| Arguments: | - time: time string to convert- format_string: format string to use for conversion |
| Example: | to_time(str_col, 'HH:mm:ss.nnnnnnnnn') |
to_timestamp
Converts a string to timestamp for loading into an Ocient timestamp column. Format string follows these formatting rules.
| Signature: | number to_timestamp(string $timestamp, string $format_string) |
|---|---|
| Arguments: | - timestamp: timestamp string to convert- format_string: format string to use for conversion |
| Example: | to_timestamp(str_col, 'yyyy-MM-dd HH:mm:ss.SSS') |
to_tuple
Works on tuples that get tokenized at the first nesting level by a given delimiter. Nested tuples, arrays or objects are considered entirely. For each tokenized tuple element, a transformation function can be specified that is applied to the tuple element. Tokenized elements are always treated as strings, requiring additional transformation when being treated as numbers.
| Signature: | array to_tuple(string $tuple_string, string $bracket_char, string $delim_char [, expression $transformation_function …]) |
|---|---|
| Arguments: | - tuple_string: the string of the tuple to be parsed.Must be encapsulated in the given bracket character and its closing peer. Empty tuple values are treated as null values. If the whole argument is NULL, the function returns NULL ignoring all given expressions in transformation_function… - bracket_char: the bracket character to use for parsing and result generation- delim_char: the delimiter character to use for parsing and result generation.The supported delimiters are: ’(’, ’[’ and ’{’ - avg(…)- keys(…)- map(…)- sum(…)- value(…)- type(…) - partially for boolean and object types |
| Example: | - to_tuple((-1,2), '(', ',', &abs(to_number(@)), &@) → '(1.0,2)'- to_tuple((,'8',2), '(', ',', &if(@ != null, @, 'missing'), &if(length(@) > `0`, @, 'missing'), &to_number(@) ) → '("missing","'8'",2)' |
tokenize
Returns array of strings after splitting on the regex pattern.
Flags can be:
s→ Pattern.DOTALLm→ Pattern.MULTILINEi→ Pattern.CASE_INSENSITIVEx→ Pattern.COMMENTSq→ Pattern.LITERAL
| Signature: | array[string] tokenize(string $val, string $pattern, string $flags) |
|---|---|
| Arguments: | - val: string to tokenize- pattern: token regex pattern- flags: regex flags |
| Example: | tokenize(col_1, ''), 'i’) |
translate
Replaces characters according to a map of characters. If the to_chars string is shorter than the from_chars string, the function removes the characters that are not mapped.
| Signature: | string translate(string $val, string $from_chars, string $to_chars) |
|---|---|
| Arguments: | - val: string to translate- from_char: from character map- to_chars: to character map |
| Example: | translate('Lorem Ipsum', 'mnop', '1234') → 'L3re1 I4su1' |
trim
Removes leading and trailing contiguous instances of a set of characters in a given string.
| Signature: | string trim(string $val, string $strip_chars) |
|---|---|
| Arguments: | - val: the input string to trim- strip_chars: set of characters that should be removed (default is ’ ’ if the value is null or not specified) |
| Example: | trim(null, ' ') → nulltrim(' Chicago ') → 'Chicago'trim(' Chicago ', ' ') → 'Chicago'trim(' Chicago ', null) → 'Chicago'trim('123test123', '321') → 'test' |
truncate
Truncates a decimal number to have a specified number of digits after its decimal point.
If decimal is null, this function returns null.
| Signature: | number truncate(number $decimal, number $decimal_places) |
|---|---|
| Arguments: | - decimal: the number to truncate- decimal_places: the number of digits to keep after the decimal point |
| Example: | truncate(decimal_col, `2`) |
tuple_element_transformation
Works on an element of a given array and applies the specified expression to that element. The function returns an array or null if specified.
| Signature: | array tuple_element_transformation(array $array_values, number $array_index, expression $transformation_function) |
|---|---|
| Arguments: | - array_values: an array of values.If the whole argument is NULL, the function returns NULL. - array_index: a number pointing to an element in the array. The first index of the array is 0- transformation_function: a transformation functions to apply to array element specified by array_index. |
| Example: | tuple_element_transformation([-1.0,2,3, '0', abs(@)) → [1.0,2,3] ] |
unique
Return all unique values in an array.
| Signature: | array[any] unique(array[any] $vals) |
|---|---|
| Arguments: | vals: array to return unique values from, removing all duplicates. |
| Example: | unique(vals_col) |
upper_case
Returns upper case version of string.
| Signature: | string upper_case(string $a) |
|---|---|
| Arguments: | a: string to upper case |
| Example: | upper_case("ocient") |
width_bucket
Returns the bucket number of the queried value in a histogram starting at min, ending at max, and consisting of num_buckets count of buckets.
- The values are inclusive on the lower bound and exclusive of the upper bound so that all buckets are the same width.
- Also note that requesting 100 buckets will actually get you 102 possible buckets. Bucket 0 through 101. Where 0 captures all values below the minimum and 101 captures all values at the maximum and beyond.
| Signature: | number width_bucket(number $val, number $min, number max, number $num_buckets) |
|---|---|
| Arguments: | - val: value to determine the bucket number for- min: min value of histogram- max: max value of histogram- num_buckets: number of buckets in histogram |
| Example: | width_bucket(my_col, `0`, `100`, `20`) |
zip
Zips N one-dimensional arrays into a single two-dimensional array with each inner array containing N elements.
The resulting outer array will have a length equal to the max length of any input array. For input arrays smaller than the max length, null elements will be placed in the inner arrays.
| Signature: | array[any] zip(array[any] $a0, array[any] $a1, [, array[any] $…]) |
|---|---|
| Arguments: | - a0: array to zip- a1: array to zip- … : Optional additional arrays to zip |
| Example: | zip(array_col0, array_col1, array_col2) |
Special Functions
EXPLODE
Performs an explosion on all indicated columns, producing one row per value in the exploded array. This can be applied to multiple columns in the same table. Non-exploded columns are held constant and exploded columns are zipped together in array order. If multiple exploded columns have different array lengths, the record is exploded into N records where N is the max length. Columns with array lengths less than N will havenull values after each of their values are exploded.
You can explode a maximum of 8,192 array elements for each source record.
Usage
EXPLODE(array[any] arr) can be used like any other JMESPath function on a JSON Array type with a few limitations:
EXPLODEcan only be used as the outermost function in a JMESPath expressionEXPLODEonly works on a single dimension. Multidimensional explode is not supported.
Examples
Single-Column EXPLODE Transform Configuration:JSON
JSON
JSON
JSON
JSON
JSON
col1 has a null value because array1 only has 3 values while array0 has 4.

