Loading and Transformation Ref...

Data Transformation with JMESP...

Transformation Functions

LAT supports many transformation functions to modify data during loading. The functions include standard JMESPath Functions and -specific transformation functions.
Transformation functions in LAT are case sensitive. The standard syntax for transformation function calls is all lower case characters.
Standard JMESPath Functions
The following standard JMESPath Functions are supported in LAT Transformation expressions. For details about using these functions, see JMESPath Functions.
JMESPath Function
Function Signature
Example
abs
number abs(number $value)
abs(num_1)
avg
number avg(array[number] $elements)
avg(array_1)
contains
boolean contains(array|string $subject, any $search)
contains(array_1, 'test')
ceil
number ceil(number $value)
ceil(num_1)
ends_with
boolean ends_with(string $subject, string $prefix)
ends_with(str_1, 'ing')
floor
number floor(number $value)
floor(num_1)
join
string join(string $glue, array[string] $stringsarray)
join(',', array_1)
keys
array keys(object $obj)
keys(obj_1)
length
number length(string|array|object $subject)
length(array_1)
map
array[any] map(expression→any→any expr, array[any] elements)
map(&to_string(@), array_1)
max
number max(array[number]|array[string] $collection)
max(array_1)
max_by
max_by(array elements, expression→number|expression→string expr)
max_by(array_1, &age)
merge
object merge([object *argument, [, object $…​]])
merge(obj_1, obj_2, obj_3)
min
number min(array[number]|array[string] $collection)
min(array_1)
min_by
min_by(array elements, expression→number|expression→string expr)
min_by(array_1, &age)
not_null
any not_null([any $argument [, any $…​]])
not_null(val_1, val_2, val_3)
reverse
array reverse(string|array $argument)
reverse(str_1)
sort
array sort(array[number]|array[string] $list)
sort(array_1)
sort_by
sort_by(array elements, expression→number|expression→string expr)
sort_by(array_1, &age)
starts_with
boolean starts_with(string $subject, string $prefix)
starts_with(str_1, 'chi')
sum
number sum(array[number] $collection)
sum(array_1)
to_array
array to_array(any $arg)
to_array(val_1)
to_string
string to_string(any $arg)
to_string(val_1)
type
string type(array|object|string|number|boolean|null $subject)
type(val_1)
values
array values(object $obj)
values(obj_1)
Other Transformation Functions
The following transformation functions are also available in LAT transformation expressions.
add
Returns the sum of the two arguments.
Signature:
number add(number $a, number $b)
Arguments:
- a: first argument
- b: second argument
Example:
add(int_col_0, int_col_1)
array_cap
Caps the length of an array to a maximum number of elements.
Signature:
array array_cap(Array[any] $array_1, number $max_elements)
Arguments:
- array_1: the array to cap
- max_elements: number of elements to cap the array at
Example:
array_cap(array_1, `1000`)
array_cat
Concatenate two arrays, returning an array. Accepts an array or null value, returning an empty array if both parameters are null.
Signature:
array array_cap(array[any] $array_1, array[any] $array_2)
Arguments:
- array_1: first array
- array_2: second array concatenated to first
Example:
array_cat(array_1, array_2)
concat
Concatenates two strings or converts arguments into strings and concatenates them into a string.
Signature:
string concat(any $a, any $b)
Arguments:
- a: first argument (automatically converted to string when not specified as a string)
- b: second argument (automatically converted to string when not specified as a string)
Example:
concat("ocient", `10`)
divide
Returns the floating-point division of the two arguments.
Signature:
number divide(number $a, number $b)
Arguments:
- a: first argument
- b: second argument
Example:
divide(int_col_0, int_col_1)
hash_code
Returns an integer hash code for a string, number, or Boolean.
Signature:
number hash_code(string|number|boolean $val)
Arguments:
val: the value to hash
Example:
hash_code(null) → null
hash_code("Chicago") → -1884315574
hash_code("test123") → -1422501792
hash_code(`12345`) -> 12345
hash_code(`123.45`) -> -1936584703
hash_code(`true`) -> 3
hash_code(`false`) -> 1
if
Returns one of the other arguments based on the truth value of the first. When the first argument is true, return the second argument. When false, return the third. 
A NULL input into the first argument will be evaluated as false. Both branches are eagerly evaluated.
Signature:
any if(boolean $b, any $if_true, any $if_false)
Arguments:
- b: boolean value that determines which of the options to return
- if_true: value returned if b is true
- if_false: value returned if b is false
Example:
if(value > `5`, "North America", "Europe")
json
Parses a string into a JSON value for use in further JMESPath operations. All values other than string are returned unmodified. Strings containing null, boolean, number, array, and object types are supported.
Signature:
object json(string $val)
Arguments:
val: string to parse
Example:
json("{ \"key\": 10 }")
lazy_if
Returns one of the other arguments based on the truth value of the first. When the first argument is true, return the second argument. When false, return the third. The second and third arguments are both expressions. Branches are lazily evaluated.
Signature:
any lazy_if(boolean $b, expression $if_true, expression $if_false, any data)
Arguments:
- b: boolean value that determines which of the options to return
- if_true: expression evaluated and returned if true
- if_false: expression evaluated and returned if false
- data: information used to evaluate either expression
Example:
if(value > 5, &"North America", &abs(@), 1.0)
left
Returns the num_chars leftmost characters of the string column. If null, null is returned. If the index is greater than the length of the string, the entire string is returned.
Signature:
string left(string $val, number $num_chars)
Arguments:
- a: string column
- num_chars: number of chars to use from the left of a string column. 
Example:
left("ocient", `3`)
lower_case
Returns lower case version of a string.
Signature:
string lower_case(string $a)
Arguments:
a: string to lower case
Example:
lower_case("OCIENT")
lpad
Pads an input string to a specified length with a padding string added to the left side.
Signature:
string lpad(string $val, number $length, string $padding)
Arguments:
- val: the input string. This cannot be null.
- length: the length you want after padding. If the input string is longer than this length, it is truncated to length characters
- padding: the padding string. If no padding string is provided, the input string is padded with spaces.
Example:
lpad(string_1, `3`, '_')
ltrim
Removes leading contiguous instances of a set of characters in a given string.
Signature:
string ltrim(string $val, string $strip_chars)
Arguments:
- val: the input string to trim
- strip_chars: set of characters that should be removed (default is ' ' if the value is null or not specified)
Example:
ltrim(null, ' ') → null
ltrim(' Chicago ') → 'Chicago '
ltrim(' Chicago ', ' ') → 'Chicago '
ltrim(' Chicago ', null) → 'Chicago '
ltrim('123test123', '321') → 'test123'
map
An override of JmesPath’s built in map which maps a given expression into elements of an array. This override makes the small adjustment that if the array itself is null, then a null is returned.
Signature:
array map(expression $expr, array[any] array_1)
Arguments:
- expr: expression the expression to map
- array_1: array[any] the array to which the expression will be mapped
Example:
map(&if(@ == 1, true, false), [1,2,1,1,2]) → [true, false, true, true, false]
matches
Returns Boolean for whether this input matches the pattern.
Flags can be:
s → Pattern.DOTALL
m → Pattern.MULTILINE
i → Pattern.CASE_INSENSITIVE
x → Pattern.COMMENTS
q → Pattern.LITERAL
Signature:
boolean matches(string $val, string $pattern, string $flags)
Arguments:
- val: value to match
- pattern: regex to match against
- args: regex flags
Example:
matches(col_1, ".*mytest.*", 'i')
millis_to_timestamp
Converts an integer number of milliseconds since the Epoch to an Ocient timestamp column.
Signature:
number millis_to_timestamp(number $t)
Arguments:
t: value to convert to an Ocient timestamp
Example:
millis_to_timestamp(my_millis)
multiply
Returns the floating-point multiplication of the two arguments.
Signature:
number multiply(number $a, number $b)
Arguments:
- a: first argument
- b: second argument
Example:
multiply(int_col_0, int_col_1)
nanos_to_timestamp
Converts an integer number of nanoseconds since the Epoch to an Ocient timestamp column.
Signature:
number nanos_to_timestamp(number $t)
Arguments:
- t: value to convert to an Ocient timestamp
Example:
nanos_to_timestamp(my_nanosecs)
normalize_space
Removes leading and trailing whitespace and replaces connected whitespace with a single space.
Signature:
string normalize_space(string $val)
Arguments:
val: string to normalize
Example:
normalize_space(col_1)
now, current_timestamp
Returns the current timestamp in nanos. This will correctly load into an Ocient timestamp column
Signature:
number now() | number current_timestamp()
Arguments:
None
Example:
now(), current_timestamp()
null_if, nullif
Returns null if the first argument matches any of the following arguments. Can be applied to multiple columns or to compare a column with a constant. Can also be used with more than two arguments in which case returns null if a matches any of the subsequent arguments. Returns the value the first argument if it does not match any other arguments.
Signature:
any null_if(any $val, any $c0, [, any $…​])
Arguments:
- val: value to compare
- c0: required value to compare against
- …​ : optional multiple values to compare against
Example:
null_if(col_a, "NULL", "NONE")
parse_array
Given a string, parses it into an array of strings. Handles nested arrays as well.
Signature:
array parse_array(string $array_string, string $bracket_char, string $delim_char, int $dimensions)
Arguments:
- array_string: the string of the array to be parsed
- bracket_char: the opening bracket character to use
- delim_char: the delimiter character to use
- dimensions: the dimension of this array
Example:
parse_array("[ 1, 2, 3]", "[", ",", `1`) ⇒ [1, 2, 3]
parse_array("[[1,2],[3], []]", '[', ',', `2`) ⇒ [[1,2], [3], []]
parse_array("{ \"cat\", \"hat\" }", "{", ",", `1`) ⇒ ["cat", "hat"]
record_uuid
Returns string that represents a unique identifier for the record in a given pipeline for a given file_group or topic. This return value will be consistent over multiple calls in the pipeline's transformation configuration for the same source record. Can be used as a surrogate key when loading multiple tables from the same source record in a pipeline to facilitate joins when no unique identifier exists on the source record. 
Signature:
string record_uuid()
Arguments:
None
Example:
record_uuid() -> "6064fc40-9961-4f5e-b74d-090458e4a609"
replace
Returns string with text matching the regex pattern replaced by the replacement text.
Flags can be:
s → Pattern.DOTALL
m → Pattern.MULTILINE
i → Pattern.CASE_INSENSITIVE
x → Pattern.COMMENTS
q → Pattern.LITERAL
Signature:
string replace(string $val, string $pattern, string $replacement_text, string $flags)
Arguments:
- val: value to search for replacement
- pattern: section to replace
- replacement_text: replacement string
- args: regex flags
Example:
replace(col_1, '[oO]ld', 'new', 'q')
right
Returns the num_chars rightmost characters of the string column. If null, null is returned. If the index is greater than the length of the string, the entire string is returned.
Signature:
string right(string $a, number $num_chars)
Arguments:
- a: string column
- num_chars: number of chars to use from the right of a string column. 
Example:
right("ocient", `3`)
rpad
Pads an input string to a specified length with a padding string added to the right side.
Signature:
string rpad(string $val, number $length, string $padding)
Arguments:
- val: the input string. This cannot be null.
- length: the length you want after padding. If the input string is longer than this length, it is truncated to length characters
- padding: the padding string. If no padding string is provided, the input string is padded with spaces.
Example:
rpad(string_1, `3`, '_')
rtrim
Removes trailing contiguous instances of a set of characters in a given string.
Signature:
string rtrim(string $val, string $strip_chars)
Arguments:
- val: the input string to trim
- strip_chars: set of characters that should be removed (default is ' ' if the value is null or not specified)
Example:
rtrim(null, ' ') → null
rtrim(' Chicago ') → ' Chicago'
rtrim(' Chicago ', ' ') → ' Chicago'
rtrim(' Chicago ', null) → ' Chicago'
rtrim('123test123', '321') → '123test'
sample
Returns true or false based on the given sample rate. The sample rate must be a number. Commonly used inside of IF statements. An exception will be thrown if the sample rate is outside of [0.0, 1.0].
Signature:
boolean sample(number $rate)
Arguments:
rate: rate to return true. Must be between [0.0, 1.0].
Example:
sample(`0.5`)
st_forcepolygonccw
Forces a polygon to have a counterclockwise rotation of the outer polygon ring and a clockwise rotation of the inner polygon ring. You can use this function to convert polygons extracted from systems that interpret polygon rotation differently.
Signature:
geometry st_forcepolygonccw(string $wkt_geometry)
Arguments:
wkt_geometry: WKT string representing the polygon to rotate. Also accepts WKT of POINT or LINESTRING and returns those types unmodified.
Example:
st_forcepolygonccw('POLYGON(…​)')
st_geomfromewkb
Returns a geometry from an Extended Well-Known Binary (EWKB) and Well-Known Binary (WKB) representations of a geometry. You can load the result of this function into POINT, LINESTRING, and POLYGON column types.
Signature:
geometry st_geomfromewkb(string ewkb)
Arguments:
wkb: hex encoded string Extended WKB or WKB representation of a geometry. EWKB or WKB can be standalone hexadecimal characters or can be prefixed with either 0x or \x.
Example:
st_geomfromewkb('0103…​')
st_geomfromewkb('0x0103…​')
st_geomfromewkb('\x0103…​')
st_point
Constructs an POINT from numeric coordinates that represent the longitude and latitude of the point. 
Signature:
geometry st_point(float longitude, float latitude)
Arguments:
- longitude: floating point longitude of the point to construct
- latitude: floating point latitude of the point to construct
Example:
st_point(float_col_1, float_col_2)
substring
Returns the substring beginning at start_index (with indexes beginning at 1) to the end of the string, or to num_chars characters.
Signature:
string substring(string $val, number $start_index, [number $num_chars])
Arguments:
val: string to substring
start_index: index to start at. First character is index 1.
num_chars: optional. The maximum number of chars to return
Example:
substring(str_col, `10`)
substring(str_col, `10`, `5`)
substring('Chicago Bears', `3`) → 'icago Bears'
substring('Chicago Bears', `3`, `4`) → 'icag'
substring_after
Returns the string after the first occurrence of the needle.
Signature:
string substring_after(string $val, string $needle)
Arguments:
- val: string to substring
- needle: delimiter to use to designate where to return subsequent substring. 
Example:
substring_after('my_text', '_') → 'text'
substring_before
Returns the string before the first occurrence of the needle. Both parameters are literal strings.
Signature:
string substring_before(string $val, string $needle)
Arguments:
- val: string to substring
- needle: delimiter to use to designate where to return prior substring. 
Example:
substring_before('my_text', '_') → 'my'
subtract
Returns the difference of the two arguments.
Signature:
number subtract(number $a, number $b)
Arguments:
- a: first argument
- b: second argument
Example:
subtract(int_col_0, int_col_1)
to_array_length
Turns any JSON object into an array with N copies of that object.
Signature:
array[any] to_array_length(any $val, number $n)
Arguments:
- val: value to copy into an array
- n: number of copies to make
Example:
to_array_length(value_col, `3`)
to_binary
Converts a string to binary data for loading into an Ocient binary or hash column.
If mode is 'hex', data is parsed as a sequence of hexadecimal digits. Note that the sequence of digits must not begin with 0x.
Otherwise, mode must be the name of a character encodings listed here, and data is encoded in that encoding.
Signature:
binary to_binary(string $data, string $mode)
Arguments:
- data: data to convert to binary
- mode: either 'hex' or the name of a standard Java® character encoding
Example:
to_binary(str_col, 'utf-16')
to_date
Converts a string to a date for loading into an Ocient date column. Format string follows these formatting rules.
Signature:
number to_date(string $date, string $format_string)
Arguments:
- date: date string to convert
- format_string: format string to use for conversion
Example:
to_date(str_col, 'yyyy-MM-dd')
to_time
Converts a string into a time of day for loading into an Ocient time column. Format string follows these formatting rules.
Signature:
number to_time(string $time, string $format_string)
Arguments:
- time: time string to convert
- format_string: format string to use for conversion
Example:
to_time(str_col, 'HH:mm:ss.nnnnnnnnn')
to_timestamp
Converts a string to timestamp for loading into an Ocient timestamp column. Format string follows these formatting rules.
Signature:
number to_timestamp(string $timestamp, string $format_string)
Arguments:
- timestamp: timestamp string to convert
- format_string: format string to use for conversion
Example:
to_timestamp(str_col, 'yyyy-MM-dd HH:mm:ss.SSS')
to_tuple
Works on tuples that get tokenized at the first nesting level by a given delimiter. Nested tuples, arrays or objects are considered entirely. For each tokenized tuple element, a transformation function can be specified that is applied to the tuple element. Tokenized elements are always treated as strings, requiring additional transformation when being treated as numbers.
Signature:
array to_tuple(string $tuple_string, string $bracket_char, string $delim_char [, expression $transformation_function …​])
Arguments:
- tuple_string: the string of the tuple to be parsed.
Must be encapsulated in the given bracket character and its closing peer. Empty tuple values are treated as null values.
If the whole argument is NULL, the function returns NULL ignoring all given expressions in transformation_function…​
- bracket_char: the bracket character to use for parsing and result generation
- delim_char: the delimiter character to use for parsing and result generation.

The supported delimiters are: '(', '[' and '{'
﻿
- avg(…​)
- keys(…​)
- map(…​)
- sum(…​)
- value(…​)
- type(…​) - partially for boolean and object types
Example:
to_tuple((-1,2), '(', ',', &abs(to_number(@)), &@) → '(1.0,2)'
to_tuple((,'8',2), '(', ',', &if(@ != null, @, 'missing'), &if(length(@) > `0`, @, 'missing'), &to_number(@) ) → '("missing","'8'",2)'
tokenize
Returns array of strings after splitting on the regex pattern.
Flags can be:
s → Pattern.DOTALL
m → Pattern.MULTILINE
i → Pattern.CASE_INSENSITIVE
x → Pattern.COMMENTS
q → Pattern.LITERAL
Signature:
array[string] tokenize(string $val, string $pattern, string $flags)
Arguments:
- val: string to tokenize
- pattern: token regex pattern
- flags: regex flags
Example:
tokenize(col_1, '$', 'i')
translate
Replaces characters according to a map of characters. If the to_chars string is shorter than the from_chars string, the function removes the characters that are not mapped.
Signature:
string translate(string $val, string $from_chars, string $to_chars)
Arguments:
- val: string to translate
- from_char: from character map
- to_chars: to character map
Example:
translate('Lorem Ipsum', 'mnop', '1234') → 'L3re1 I4su1'
trim
Removes leading and trailing contiguous instances of a set of characters in a given string.
Signature:
string trim(string $val, string $strip_chars)
Arguments:
- val: the input string to trim
- strip_chars: set of characters that should be removed (default is ' ' if the value is null or not specified)
Example:
trim(null, ' ') → null
trim(' Chicago ') → 'Chicago'
trim(' Chicago ', ' ') → 'Chicago'
trim(' Chicago ', null) → 'Chicago'
trim('123test123', '321') → 'test'
truncate
Truncates a decimal number to have a specified number of digits after its decimal point.
If decimal is null, this function returns null.
During the transformation stage, the LAT stores numbers as integers or floating point numbers by default. However, floating point numbers are inexact. For this reason, this function might return unexpected results. For example, truncate(`0.29`, `2`) evaluates to 0.28, not 0.29 (because 0.29 is really 0.28999999...).
Signature:
number truncate(number $decimal, number $decimal_places)
Arguments:
- decimal: the number to truncate
- decimal_places: the number of digits to keep after the decimal point
Example:
truncate(decimal_col, `2`)
tuple_element_transformation
Works on an element of a given array and applies the specified expression to that element. The function returns an array or null if specified.
Signature:
array tuple_element_transformation(array $array_values, number $array_index, expression $transformation_function)
Arguments:
- array_values: an array of values.
If the whole argument is NULL, the function returns NULL.
- array_index: a number pointing to an element in the array. The first index of the array is 0
- transformation_function: a transformation functions to apply to array element specified by array_index.
Example:
tuple_element_transformation([-1.0,2,3, '0', abs(@)) → [1.0,2,3] ]
unique
Return all unique values in an array.
Signature:
array[any] unique(array[any] $vals)
Arguments:
vals: array to return unique values from, removing all duplicates.
Example:
unique(vals_col)
upper_case
Returns upper case version of string.
Signature:
string upper_case(string $a)
Arguments:
a: string to upper case
Example:
upper_case("ocient")
width_bucket
Returns the bucket number of the queried value in a histogram starting at min, ending at max, and consisting of num_buckets count of buckets.
The values are inclusive on the lower bound and exclusive of the upper bound so that all buckets are the same width.
Also note that requesting 100 buckets will actually get you 102 possible buckets. Bucket 0 through 101. Where 0 captures all values below the minimum and 101 captures all values at the maximum and beyond.
Signature:
number width_bucket(number $val, number $min, number max, number $num_buckets)
Arguments:
- val: value to determine the bucket number for
- min: min value of histogram
- max: max value of histogram
- num_buckets: number of buckets in histogram
Example:
width_bucket(my_col, `0`, `100`, `20`)
zip
Zips N 1D arrays into a single 2D array with each inner array containing N elements.
The resulting outer array will have a length equal to the max length of any input array. For input arrays smaller than the max length, null elements will be placed in the inner arrays.
Signature:
array[any] zip(array[any] $a0, array[any] $a1, [, array[any] $…​])
Arguments:
- a0: array to zip
- a1: array to zip
- …​ : Optional additional arrays to zip
Example:
zip(array_col0, array_col1, array_col2)
Special Functions
EXPLODE
Performs an explosion on all indicated columns, producing one row per value in the exploded array. This can be applied to multiple columns in the same table. Non-exploded columns are held constant and exploded columns are zipped together in array order. If multiple exploded columns have different array lengths, the record is exploded into N records where N is the max length. Columns with array lengths less than N will have null values after each of their values are exploded.
You can explode a maximum of 8,192 array elements for each source record.
Usage
EXPLODE(array[any] arr) can be used like any other JMESPath function on a JSON Array type with a few limitations:
EXPLODE can only be used as the outermost function in a JMESPath expression
EXPLODE only works on a single dimension. Multidimensional explode is not supported.
Examples
Single-Column EXPLODE
Transform Configuration:
JSON
{
    "transform": {
        "topics" : {
            "topic0" : {
                "tables" : {
                    "table0" : {
                        "columns" {
                            "col0": "EXPLODE(array0)",
                            "col1": "integer0"
                        }
                    }
                }
            }
        }
    }
}
{
    "transform": {
        "topics" : {
            "topic0" : {
                "tables" : {
                    "table0" : {
                        "columns" {
                            "col0": "EXPLODE(array0)",
                            "col1": "integer0"
                        }
                    }
                }
            }
        }
    }
}
﻿
Input:
JSON
{
    "array0": [0, 1, 2, 3],
    "integer0": 4
}
{
    "array0": [0, 1, 2, 3],
    "integer0": 4
}
﻿
Output:
JSON
[
    {
        "col0": 0,
        "col1": 4
    },
    {
        "col0": 1,
        "col1": 4
    },
    {
        "col0": 2,
        "col1": 4
    },
    {
        "col0": 3,
        "col1": 4
    }
]
[
    {
        "col0": 0,
        "col1": 4
    },
    {
        "col0": 1,
        "col1": 4
    },
    {
        "col0": 2,
        "col1": 4
    },
    {
        "col0": 3,
        "col1": 4
    }
]
﻿
Multi-Column EXPLODE
Transform Configuration:
JSON
{
    "transform": {
        "topics" : {
            "topic0" : {
                "tables" : {
                    "table0" : {
                        "columns" : {
                            "col0": "EXPLODE(array0)",
                            "col1": "EXPLODE(array1)",
                            "col2": "integer0"
                        }
                    }
                }
            }
        }
    }
}
{
    "transform": {
        "topics" : {
            "topic0" : {
                "tables" : {
                    "table0" : {
                        "columns" : {
                            "col0": "EXPLODE(array0)",
                            "col1": "EXPLODE(array1)",
                            "col2": "integer0"
                        }
                    }
                }
            }
        }
    }
}
﻿
Input:
JSON
{
    "array0": [0, 1, 2, 3],
    "array1": [0, 1, 2],
    "integer0": 4
}
{
    "array0": [0, 1, 2, 3],
    "array1": [0, 1, 2],
    "integer0": 4
}
﻿
Output:
JSON
[
    {
        "col0": 0,
        "col1": 0,
        "col2": 4
    },
    {
        "col0": 1,
        "col1": 1,
        "col2": 4
    },
    {
        "col0": 2,
        "col1": 2,
        "col2": 4
    },
    {
        "col0": 3,
        "col1": null,
        "col2": 4
    }
]
[
    {
        "col0": 0,
        "col1": 0,
        "col2": 4
    },
    {
        "col0": 1,
        "col1": 1,
        "col2": 4
    },
    {
        "col0": 2,
        "col1": 2,
        "col2": 4
    },
    {
        "col0": 3,
        "col1": null,
        "col2": 4
    }
]
﻿
Note that in the 4th record col1 has a null value because array1 only has 3 values while array0 has 4.
Related Links
﻿Load Data﻿﻿

JMESPath Function	Function Signature	Example
abs	number abs(number $value)	abs(num_1)
avg	number avg(array[number] $elements)	avg(array_1)
contains	boolean contains(array\|string $subject, any $search)	contains(array_1, 'test')
ceil	number ceil(number $value)	ceil(num_1)
ends_with	boolean ends_with(string $subject, string $prefix)	ends_with(str_1, 'ing')
floor	number floor(number $value)	floor(num_1)
join	string join(string $glue, array[string] $stringsarray)	join(',', array_1)
keys	array keys(object $obj)	keys(obj_1)
length	number length(string\|array\|object $subject)	length(array_1)
map	array[any] map(expression→any→any expr, array[any] elements)	map(&to_string(@), array_1)
max	number max(array[number]\|array[string] $collection)	max(array_1)
max_by	max_by(array elements, expression→number\|expression→string expr)	max_by(array_1, &age)
merge	object merge([object *argument, [, object $…]])	merge(obj_1, obj_2, obj_3)
min	number min(array[number]\|array[string] $collection)	min(array_1)
min_by	min_by(array elements, expression→number\|expression→string expr)	min_by(array_1, &age)
not_null	any not_null([any $argument [, any $…]])	not_null(val_1, val_2, val_3)
reverse	array reverse(string\|array $argument)	reverse(str_1)
sort	array sort(array[number]\|array[string] $list)	sort(array_1)
sort_by	sort_by(array elements, expression→number\|expression→string expr)	sort_by(array_1, &age)
starts_with	boolean starts_with(string $subject, string $prefix)	starts_with(str_1, 'chi')
sum	number sum(array[number] $collection)	sum(array_1)
to_array	array to_array(any $arg)	to_array(val_1)
to_string	string to_string(any $arg)	to_string(val_1)
type	string type(array\|object\|string\|number\|boolean\|null $subject)	type(val_1)
values	array values(object $obj)	values(obj_1)

Signature:	number add(number $a, number $b)
Arguments:	- a: first argument - b: second argument
Example:	add(int_col_0, int_col_1)

Signature:	array array_cap(Array[any] $array_1, number $max_elements)
Arguments:	- array_1: the array to cap - max_elements: number of elements to cap the array at
Example:	array_cap(array_1, `1000`)

Signature:	array array_cap(array[any] $array_1, array[any] $array_2)
Arguments:	- array_1: first array - array_2: second array concatenated to first
Example:	array_cat(array_1, array_2)

Signature:	string concat(any $a, any $b)
Arguments:	- a: first argument (automatically converted to string when not specified as a string) - b: second argument (automatically converted to string when not specified as a string)
Example:	concat("ocient", `10`)

Signature:	number divide(number $a, number $b)
Arguments:	- a: first argument - b: second argument
Example:	divide(int_col_0, int_col_1)

Signature:	number hash_code(string\|number\|boolean $val)
Arguments:	val: the value to hash
Example:	hash_code(null) → null hash_code("Chicago") → -1884315574 hash_code("test123") → -1422501792 hash_code(`12345`) -> 12345 hash_code(`123.45`) -> -1936584703 hash_code(`true`) -> 3 hash_code(`false`) -> 1

Signature:	any if(boolean $b, any $if_true, any $if_false)
Arguments:	- b: boolean value that determines which of the options to return - if_true: value returned if b is true - if_false: value returned if b is false
Example:	if(value > `5`, "North America", "Europe")

Signature:	object json(string $val)
Arguments:	val: string to parse
Example:	json("{ \"key\": 10 }")

Signature:	any lazy_if(boolean $b, expression $if_true, expression $if_false, any data)
Arguments:	- b: boolean value that determines which of the options to return - if_true: expression evaluated and returned if true - if_false: expression evaluated and returned if false - data: information used to evaluate either expression
Example:	if(value > 5, &"North America", &abs(@), 1.0)

Signature:	string left(string $val, number $num_chars)
Arguments:	- a: string column - num_chars: number of chars to use from the left of a string column.
Example:	left("ocient", `3`)

Signature:	string lower_case(string $a)
Arguments:	a: string to lower case
Example:	lower_case("OCIENT")

Signature:	string lpad(string $val, number $length, string $padding)
Arguments:	- val: the input string. This cannot be null. - length: the length you want after padding. If the input string is longer than this length, it is truncated to length characters - padding: the padding string. If no padding string is provided, the input string is padded with spaces.
Example:	lpad(string_1, `3`, '_')

Signature:	string ltrim(string $val, string $strip_chars)
Arguments:	- val: the input string to trim - strip_chars: set of characters that should be removed (default is ' ' if the value is null or not specified)
Example:	ltrim(null, ' ') → null ltrim(' Chicago ') → 'Chicago ' ltrim(' Chicago ', ' ') → 'Chicago ' ltrim(' Chicago ', null) → 'Chicago ' ltrim('123test123', '321') → 'test123'

Signature:	array map(expression $expr, array[any] array_1)
Arguments:	- expr: expression the expression to map - array_1: array[any] the array to which the expression will be mapped
Example:	map(&if(@ == 1, true, false), [1,2,1,1,2]) → [true, false, true, true, false]

Signature:	boolean matches(string $val, string $pattern, string $flags)
Arguments:	- val: value to match - pattern: regex to match against - args: regex flags
Example:	matches(col_1, ".mytest.", 'i')

Signature:	number millis_to_timestamp(number $t)
Arguments:	t: value to convert to an Ocient timestamp
Example:	millis_to_timestamp(my_millis)

Signature:	number multiply(number $a, number $b)
Arguments:	- a: first argument - b: second argument
Example:	multiply(int_col_0, int_col_1)

Signature:	number nanos_to_timestamp(number $t)
Arguments:	- t: value to convert to an Ocient timestamp
Example:	nanos_to_timestamp(my_nanosecs)

Signature:	string normalize_space(string $val)
Arguments:	val: string to normalize
Example:	normalize_space(col_1)

Signature:	number now() \| number current_timestamp()
Arguments:	None
Example:	now(), current_timestamp()

Signature:	any null_if(any $val, any $c0, [, any $…])
Arguments:	- val: value to compare - c0: required value to compare against - … : optional multiple values to compare against
Example:	null_if(col_a, "NULL", "NONE")

Signature:	array parse_array(string $array_string, string $bracket_char, string $delim_char, int $dimensions)
Arguments:	- array_string: the string of the array to be parsed - bracket_char: the opening bracket character to use - delim_char: the delimiter character to use - dimensions: the dimension of this array
Example:	parse_array("[ 1, 2, 3]", "[", ",", `1`) ⇒ [1, 2, 3] parse_array("[[1,2],[3], []]", '[', ',', `2`) ⇒ [[1,2], [3], []] parse_array("{ \"cat\", \"hat\" }", "{", ",", `1`) ⇒ ["cat", "hat"]

Signature:	string record_uuid()
Arguments:	None
Example:	record_uuid() -> "6064fc40-9961-4f5e-b74d-090458e4a609"

Signature:	string replace(string $val, string $pattern, string $replacement_text, string $flags)
Arguments:	- val: value to search for replacement - pattern: section to replace - replacement_text: replacement string - args: regex flags
Example:	replace(col_1, '[oO]ld', 'new', 'q')

Signature:	string right(string $a, number $num_chars)
Arguments:	- a: string column - num_chars: number of chars to use from the right of a string column.
Example:	right("ocient", `3`)

Signature:	string rpad(string $val, number $length, string $padding)
Arguments:	- val: the input string. This cannot be null. - length: the length you want after padding. If the input string is longer than this length, it is truncated to length characters - padding: the padding string. If no padding string is provided, the input string is padded with spaces.
Example:	rpad(string_1, `3`, '_')

Signature:	string rtrim(string $val, string $strip_chars)
Arguments:	- val: the input string to trim - strip_chars: set of characters that should be removed (default is ' ' if the value is null or not specified)
Example:	rtrim(null, ' ') → null rtrim(' Chicago ') → ' Chicago' rtrim(' Chicago ', ' ') → ' Chicago' rtrim(' Chicago ', null) → ' Chicago' rtrim('123test123', '321') → '123test'

Signature:	boolean sample(number $rate)
Arguments:	rate: rate to return true. Must be between [0.0, 1.0].
Example:	sample(`0.5`)

Signature:	geometry st_forcepolygonccw(string $wkt_geometry)
Arguments:	wkt_geometry: WKT string representing the polygon to rotate. Also accepts WKT of POINT or LINESTRING and returns those types unmodified.
Example:	st_forcepolygonccw('POLYGON(…)')

Signature:	geometry st_geomfromewkb(string ewkb)
Arguments:	wkb: hex encoded string Extended WKB or WKB representation of a geometry. EWKB or WKB can be standalone hexadecimal characters or can be prefixed with either 0x or \x.
Example:	st_geomfromewkb('0103…') st_geomfromewkb('0x0103…') st_geomfromewkb('\x0103…')

Signature:	geometry st_point(float longitude, float latitude)
Arguments:	- longitude: floating point longitude of the point to construct - latitude: floating point latitude of the point to construct
Example:	st_point(float_col_1, float_col_2)

Signature:	string substring(string $val, number $start_index, [number $num_chars])
Arguments:	val: string to substring start_index: index to start at. First character is index 1. num_chars: optional. The maximum number of chars to return
Example:	substring(str_col, `10`) substring(str_col, `10`, `5`) substring('Chicago Bears', `3`) → 'icago Bears' substring('Chicago Bears', `3`, `4`) → 'icag'

Signature:	string substring_after(string $val, string $needle)
Arguments:	- val: string to substring - needle: delimiter to use to designate where to return subsequent substring.
Example:	substring_after('my_text', '_') → 'text'

Signature:	string substring_before(string $val, string $needle)
Arguments:	- val: string to substring - needle: delimiter to use to designate where to return prior substring.
Example:	substring_before('my_text', '_') → 'my'

Signature:	number subtract(number $a, number $b)
Arguments:	- a: first argument - b: second argument
Example:	subtract(int_col_0, int_col_1)

Signature:	array[any] to_array_length(any $val, number $n)
Arguments:	- val: value to copy into an array - n: number of copies to make
Example:	to_array_length(value_col, `3`)

Signature:	binary to_binary(string $data, string $mode)
Arguments:	- data: data to convert to binary - mode: either 'hex' or the name of a standard Java® character encoding
Example:	to_binary(str_col, 'utf-16')

Signature:	number to_date(string $date, string $format_string)
Arguments:	- date: date string to convert - format_string: format string to use for conversion
Example:	to_date(str_col, 'yyyy-MM-dd')

Signature:	number to_time(string $time, string $format_string)
Arguments:	- time: time string to convert - format_string: format string to use for conversion
Example:	to_time(str_col, 'HH:mm:ss.nnnnnnnnn')

Signature:	number to_timestamp(string $timestamp, string $format_string)
Arguments:	- timestamp: timestamp string to convert - format_string: format string to use for conversion
Example:	to_timestamp(str_col, 'yyyy-MM-dd HH:mm:ss.SSS')

Signature:	array to_tuple(string $tuple_string, string $bracket_char, string $delim_char [, expression $transformation_function …])
Arguments:	- tuple_string: the string of the tuple to be parsed. Must be encapsulated in the given bracket character and its closing peer. Empty tuple values are treated as null values. If the whole argument is NULL, the function returns NULL ignoring all given expressions in transformation_function… - bracket_char: the bracket character to use for parsing and result generation - delim_char: the delimiter character to use for parsing and result generation. The supported delimiters are: '(', '[' and '{' - avg(…) - keys(…) - map(…) - sum(…) - value(…) - type(…) - partially for boolean and object types
Example:	to_tuple((-1,2), '(', ',', &abs(to_number(@)), &@) → '(1.0,2)' to_tuple((,'8',2), '(', ',', &if(@ != null, @, 'missing'), &if(length(@) > `0`, @, 'missing'), &to_number(@) ) → '("missing","'8'",2)'

Signature:	array[string] tokenize(string $val, string $pattern, string $flags)
Arguments:	- val: string to tokenize - pattern: token regex pattern - flags: regex flags
Example:	tokenize(col_1, '$', 'i')

Signature:	string translate(string $val, string $from_chars, string $to_chars)
Arguments:	- val: string to translate - from_char: from character map - to_chars: to character map
Example:	translate('Lorem Ipsum', 'mnop', '1234') → 'L3re1 I4su1'

Signature:	string trim(string $val, string $strip_chars)
Arguments:	- val: the input string to trim - strip_chars: set of characters that should be removed (default is ' ' if the value is null or not specified)
Example:	trim(null, ' ') → null trim(' Chicago ') → 'Chicago' trim(' Chicago ', ' ') → 'Chicago' trim(' Chicago ', null) → 'Chicago' trim('123test123', '321') → 'test'

Signature:	number truncate(number $decimal, number $decimal_places)
Arguments:	- decimal: the number to truncate - decimal_places: the number of digits to keep after the decimal point
Example:	truncate(decimal_col, `2`)

Signature:	array tuple_element_transformation(array $array_values, number $array_index, expression $transformation_function)
Arguments:	- array_values: an array of values. If the whole argument is NULL, the function returns NULL. - array_index: a number pointing to an element in the array. The first index of the array is 0 - transformation_function: a transformation functions to apply to array element specified by array_index.
Example:	tuple_element_transformation([-1.0,2,3, '0', abs(@)) → [1.0,2,3] ]

Signature:	array[any] unique(array[any] $vals)
Arguments:	vals: array to return unique values from, removing all duplicates.
Example:	unique(vals_col)

Signature:	string upper_case(string $a)
Arguments:	a: string to upper case
Example:	upper_case("ocient")

Signature:	number width_bucket(number $val, number $min, number max, number $num_buckets)
Arguments:	- val: value to determine the bucket number for - min: min value of histogram - max: max value of histogram - num_buckets: number of buckets in histogram
Example:	width_bucket(my_col, `0`, `100`, `20`)

Updated 15 Nov 2024

Did this page help you?

Yes

Data Transformation with JMESPath

User-Defined Transformations

Signature:	array[any] zip(array[any] $a0, array[any] $a1, [, array[any] $…])
Arguments:	- a0: array to zip - a1: array to zip - … : Optional additional arrays to zip
Example:	zip(array_col0, array_col1, array_col2)