LAT Transformation Functions

data pipelines are now the preferred method for loading data into the ocient system for details, see docid\ xq0tg7yph vn62uwufibu lat supports many transformation functions to modify data during loading the functions include standard jmespath functions and {{ocient}} specific transformation functions transformation functions in lat are case sensitive the standard syntax for transformation function calls is all lower case characters standard jmespath functions the following standard jmespath functions are supported in lat transformation expressions for details about using these functions, see https //jmespath org/specification html#functions expressions jmespath function function signature example abs number abs(number $value) abs(num 1) avg number avg(array\[number] $elements) avg(array 1) contains boolean contains(array|string $subject, any $search) contains(array 1, 'test') ceil number ceil(number $value) ceil(num 1) ends with boolean ends with(string $subject, string $prefix) ends with(str 1, 'ing') floor number floor(number $value) floor(num 1) join string join(string $glue, array\[string] $stringsarray) join(',', array 1) keys array keys(object $obj) keys(obj 1) length number length(string|array|object $subject) length(array 1) map array\[any] map(expression→any→any expr, array\[any] elements) map(\&to string(@), array 1) max number max(array\[number]|array\[string] $collection) max(array 1) max by max by(array elements, expression→number|expression→string expr) max by(array 1, \&age) merge object merge(\[object argument, \[, object $…]]) merge(obj 1, obj 2, obj 3) min number min(array\[number]|array\[string] $collection) min(array 1) min by min by(array elements, expression→number|expression→string expr) min by(array 1, \&age) not null any not null(\[any $argument \[, any $…]]) not null(val 1, val 2, val 3) reverse array reverse(string|array $argument) reverse(str 1) sort array sort(array\[number]|array\[string] $list) sort(array 1) sort by sort by(array elements, expression→number|expression→string expr) sort by(array 1, \&age) starts with boolean starts with(string $subject, string $prefix) starts with(str 1, 'chi') sum number sum(array\[number] $collection) sum(array 1) to array array to array(any $arg) to array(val 1) to string string to string(any $arg) to string(val 1) type string type(array|object|string|number|boolean|null $subject) type(val 1) values array values(object $obj) values(obj 1) other transformation functions the following transformation functions are also available in lat transformation expressions add returns the sum of the two arguments signature number add(number $a, number $b) arguments a first argument b second argument example add(int col 0, int col 1) array cap caps the length of an array to a maximum number of elements signature array array cap(array\[any] $array 1, number $max elements) arguments array 1 the array to cap max elements number of elements to cap the array at example array cap(array 1, `1000`) array cat concatenate two arrays, returning an array accepts an array or null value, returning an empty array if both parameters are null signature array array cap(array\[any] $array 1, array\[any] $array 2) arguments array 1 first array array 2 second array concatenated to first example array cat(array 1, array 2) concat concatenates two strings or converts arguments into strings and concatenates them into a string signature string concat(any $a, any $b) arguments a first argument (automatically converted to string when not specified as a string) b second argument (automatically converted to string when not specified as a string) example concat("ocient", `10`) divide returns the floating point division of the two arguments signature number divide(number $a, number $b) arguments a first argument b second argument example divide(int col 0, int col 1) hash code returns an integer hash code for a string, number, or boolean signature number hash code(string|number|boolean $val) arguments val the value to hash example hash code(null) → null hash code("chicago") → 1884315574 hash code("test123") → 1422501792 hash code(`12345`) > 12345 hash code(`123 45`) > 1936584703 hash code(`true`) > 3 hash code(`false`) > 1 if returns one of the other arguments based on the truth value of the first when the first argument is true, return the second argument when false, return the third a null input into the first argument will be evaluated as false both branches are eagerly evaluated signature any if(boolean $b, any $if true, any $if false) arguments b boolean value that determines which of the options to return if true value returned if b is true if false value returned if b is false example if(value > `5`, "north america", "europe") json parses a string into a json value for use in further jmespath operations all values other than string are returned unmodified strings containing null, boolean, number, array, and object types are supported signature object json(string $val) arguments val string to parse example json("{ \\"key\\" 10 }") lazy if returns one of the other arguments based on the truth value of the first when the first argument is true, return the second argument when false, return the third the second and third arguments are both expressions branches are lazily evaluated signature any lazy if(boolean $b, expression $if true, expression $if false, any data) arguments b boolean value that determines which of the options to return if true expression evaluated and returned if true if false expression evaluated and returned if false data information used to evaluate either expression example if(value > 5, &"north america", \&abs(@), 1 0) left returns the num chars leftmost characters of the string column if null, null is returned if the index is greater than the length of the string, the entire string is returned signature string left(string $val, number $num chars) arguments a string column num chars number of chars to use from the left of a string column example left("ocient", `3`) lower case returns lower case version of a string signature string lower case(string $a) arguments a string to lower case example lower case("ocient") lpad pads an input string to a specified length with a padding string added to the left side signature string lpad(string $val, number $length, string $padding) arguments val the input string this cannot be null length the length you want after padding if the input string is longer than this length, it is truncated to length characters padding the padding string if no padding string is provided, the input string is padded with spaces example lpad(string 1, `3`, ' ') ltrim removes leading contiguous instances of a set of characters in a given string signature string ltrim(string $val, string $strip chars) arguments val the input string to trim strip chars set of characters that should be removed (default is ' ' if the value is null or not specified) example ltrim(null, ' ') → null ltrim(' chicago ') → 'chicago ' ltrim(' chicago ', ' ') → 'chicago ' ltrim(' chicago ', null) → 'chicago ' ltrim('123test123', '321') → 'test123' map an override of jmespath’s built in map which maps a given expression into elements of an array this override makes the small adjustment that if the array itself is null, then a null is returned signature array map(expression $expr, array\[any] array 1) arguments expr expression the expression to map array 1 array\[any] the array to which the expression will be mapped example map(\&if(@ == 1, true, false), \[1,2,1,1,2]) → \[true, false, true, true, false] matches returns boolean for whether this input matches the pattern flags can be s → pattern dotall m → pattern multiline i → pattern case insensitive x → pattern comments q → pattern literal signature boolean matches(string $val, string $pattern, string $flags) arguments val value to match pattern regex to match against args regex flags example matches(col 1, " mytest ", 'i') millis to timestamp converts an integer number of milliseconds since the epoch to an ocient timestamp column signature number millis to timestamp(number $t) arguments t value to convert to an ocient timestamp example millis to timestamp(my millis) multiply returns the floating point multiplication of the two arguments signature number multiply(number $a, number $b) arguments a first argument b second argument example multiply(int col 0, int col 1) nanos to timestamp converts an integer number of nanoseconds since the epoch to an ocient timestamp column signature number nanos to timestamp(number $t) arguments t value to convert to an ocient timestamp example nanos to timestamp(my nanosecs) normalize space removes leading and trailing whitespace and replaces connected whitespace with a single space signature string normalize space(string $val) arguments val string to normalize example normalize space(col 1) now, current timestamp returns the current timestamp in nanos this will correctly load into an ocient timestamp column signature number now() | number current timestamp() arguments none example now(), current timestamp() null if, nullif returns null if the first argument matches any of the following arguments can be applied to multiple columns or to compare a column with a constant can also be used with more than two arguments in which case returns null if a matches any of the subsequent arguments returns the value the first argument if it does not match any other arguments signature any null if(any $val, any $c0, \[, any $…]) arguments val value to compare c0 required value to compare against … optional multiple values to compare against example null if(col a, "null", "none") parse array given a string, parses it into an array of strings handles nested arrays as well signature array parse array(string $array string, string $bracket char, string $delim char, int $dimensions) arguments array string the string of the array to be parsed bracket char the opening bracket character to use delim char the delimiter character to use dimensions the dimension of this array example parse array("\[ 1, 2, 3]", "\[", ",", `1`) ⇒ \[1, 2, 3] parse array("\[\[1,2],\[3], \[]]", '\[', ',', `2`) ⇒ \[\[1,2], \[3], \[]] parse array("{ \\"cat\\", \\"hat\\" }", "{", ",", `1`) ⇒ \["cat", "hat"] record uuid returns string that represents a unique identifier for the record in a given pipeline for a given file group or topic this return value will be consistent over multiple calls in the pipeline's transformation configuration for the same source record can be used as a surrogate key when loading multiple tables from the same source record in a pipeline to facilitate joins when no unique identifier exists on the source record signature string record uuid() arguments none example record uuid() > "6064fc40 9961 4f5e b74d 090458e4a609" replace returns string with text matching the regex pattern replaced by the replacement text flags can be s → pattern dotall m → pattern multiline i → pattern case insensitive x → pattern comments q → pattern literal signature string replace(string $val, string $pattern, string $replacement text, string $flags) arguments val value to search for replacement pattern section to replace replacement text replacement string args regex flags example replace(col 1, '\[oo]ld', 'new', 'q') right returns the num chars rightmost characters of the string column if null, null is returned if the index is greater than the length of the string, the entire string is returned signature string right(string $a, number $num chars) arguments a string column num chars number of chars to use from the right of a string column example right("ocient", `3`) rpad pads an input string to a specified length with a padding string added to the right side signature string rpad(string $val, number $length, string $padding) arguments val the input string this cannot be null length the length you want after padding if the input string is longer than this length, it is truncated to length characters padding the padding string if no padding string is provided, the input string is padded with spaces example rpad(string 1, `3`, ' ') rtrim removes trailing contiguous instances of a set of characters in a given string signature string rtrim(string $val, string $strip chars) arguments val the input string to trim strip chars set of characters that should be removed (default is ' ' if the value is null or not specified) example rtrim(null, ' ') → null rtrim(' chicago ') → ' chicago' rtrim(' chicago ', ' ') → ' chicago' rtrim(' chicago ', null) → ' chicago' rtrim('123test123', '321') → '123test' sample returns true or false based on the given sample rate the sample rate must be a number commonly used inside of if statements an exception will be thrown if the sample rate is outside of \[0 0, 1 0] signature boolean sample(number $rate) arguments rate rate to return true must be between \[0 0, 1 0] example sample(`0 5`) st forcepolygonccw forces a polygon to have a counterclockwise rotation of the outer polygon ring and a clockwise rotation of the inner polygon ring you can use this function to convert polygons extracted from systems that interpret polygon rotation differently signature geometry st forcepolygonccw(string $wkt geometry) arguments wkt geometry wkt string representing the polygon to rotate also accepts wkt of point or linestring and returns those types unmodified example st forcepolygonccw('polygon(…)') st geomfromewkb returns a geometry from an extended well known binary (ewkb) and well known binary (wkb) representations of a geometry you can load the result of this function into point, linestring, and polygon column types signature geometry st geomfromewkb(string ewkb) arguments wkb hex encoded string extended wkb or wkb representation of a geometry ewkb or wkb can be standalone hexadecimal characters or can be prefixed with either 0x or \x example st geomfromewkb('0103…') st geomfromewkb('0x0103…') st geomfromewkb('\x0103…') st point constructs an point from numeric coordinates that represent the longitude and latitude of the point signature geometry st point(float longitude, float latitude) arguments longitude floating point longitude of the point to construct latitude floating point latitude of the point to construct example st point(float col 1, float col 2) substring returns the substring beginning at start index (with indexes beginning at 1) to the end of the string, or to num chars characters signature string substring(string $val, number $start index, \[number $num chars]) arguments val string to substring start index index to start at first character is index 1 num chars optional the maximum number of chars to return example substring(str col, `10`) substring(str col, `10`, `5`) substring('chicago bears', `3`) → 'icago bears' substring('chicago bears', `3`, `4`) → 'icag' substring after returns the string after the first occurrence of the needle signature string substring after(string $val, string $needle) arguments val string to substring needle delimiter to use to designate where to return subsequent substring example substring after('my text', ' ') → 'text' substring before returns the string before the first occurrence of the needle both parameters are literal strings signature string substring before(string $val, string $needle) arguments val string to substring needle delimiter to use to designate where to return prior substring example substring before('my text', ' ') → 'my' subtract returns the difference of the two arguments signature number subtract(number $a, number $b) arguments a first argument b second argument example subtract(int col 0, int col 1) to array length turns any json object into an array with n copies of that object signature array\[any] to array length(any $val, number $n) arguments val value to copy into an array n number of copies to make example to array length(value col, `3`) to binary converts a string to binary data for loading into an ocient binary or hash column if mode is 'hex' , data is parsed as a sequence of hexadecimal digits note that the sequence of digits must not begin with 0x otherwise, mode must be the name of a character encodings listed https //docs oracle com/en/java/javase/17/intl/supported encodings html , and data is encoded in that encoding signature binary to binary(string $data, string $mode) arguments data data to convert to binary mode either 'hex' or the name of a standard java ® character encoding example to binary(str col, 'utf 16') to date converts a string to a date for loading into an ocient date column format string follows these https //docs oracle com/en/java/javase/17/docs/api/java base/java/time/format/datetimeformatter html signature number to date(string $date, string $format string) arguments date date string to convert format string format string to use for conversion example to date(str col, 'yyyy mm dd') to time converts a string into a time of day for loading into an ocient time column format string follows these https //docs oracle com/en/java/javase/17/docs/api/java base/java/time/format/datetimeformatter html signature number to time(string $time, string $format string) arguments time time string to convert format string format string to use for conversion example to time(str col, 'hh\ mm\ ss nnnnnnnnn') to timestamp converts a string to timestamp for loading into an ocient timestamp column format string follows these https //docs oracle com/en/java/javase/17/docs/api/java base/java/time/format/datetimeformatter html signature number to timestamp(string $timestamp, string $format string) arguments timestamp timestamp string to convert format string format string to use for conversion example to timestamp(str col, 'yyyy mm dd hh\ mm\ ss sss') to tuple works on tuples that get tokenized at the first nesting level by a given delimiter nested tuples, arrays or objects are considered entirely for each tokenized tuple element, a transformation function can be specified that is applied to the tuple element tokenized elements are always treated as strings, requiring additional transformation when being treated as numbers signature array to tuple(string $tuple string, string $bracket char, string $delim char \[, expression $transformation function …]) arguments tuple string the string of the tuple to be parsed must be encapsulated in the given bracket character and its closing peer empty tuple values are treated as null values if the whole argument is null, the function returns null ignoring all given expressions in transformation function… bracket char the bracket character to use for parsing and result generation delim char the delimiter character to use for parsing and result generation the supported delimiters are '(', '\[' and '{' avg(…) keys(…) map(…) sum(…) value(…) type(…) partially for boolean and object types example to tuple(( 1,2), '(', ',', \&abs(to number(@)), &@) → '(1 0,2)' to tuple((,'8',2), '(', ',', \&if(@ != null, @, 'missing'), \&if(length(@) > `0`, @, 'missing'), \&to number(@) ) → '("missing","'8'",2)' tokenize returns array of strings after splitting on the regex pattern flags can be s → pattern dotall m → pattern multiline i → pattern case insensitive x → pattern comments q → pattern literal signature array\[string] tokenize(string $val, string $pattern, string $flags) arguments val string to tokenize pattern token regex pattern flags regex flags example tokenize(col 1, '$', 'i ') translate replaces characters according to a map of characters if the to chars string is shorter than the from chars string, the function removes the characters that are not mapped signature string translate(string $val, string $from chars, string $to chars) arguments val string to translate from char from character map to chars to character map example translate('lorem ipsum', 'mnop', '1234') → 'l3re1 i4su1' trim removes leading and trailing contiguous instances of a set of characters in a given string signature string trim(string $val, string $strip chars) arguments val the input string to trim strip chars set of characters that should be removed (default is ' ' if the value is null or not specified) example trim(null, ' ') → null trim(' chicago ') → 'chicago' trim(' chicago ', ' ') → 'chicago' trim(' chicago ', null) → 'chicago' trim('123test123', '321') → 'test' truncate truncates a decimal number to have a specified number of digits after its decimal point if decimal is null, this function returns null during the transformation stage, the lat stores numbers as integers or floating point numbers by default however, floating point numbers are inexact for this reason, this function might return unexpected results for example, truncate(`0 29`, `2`) evaluates to 0 28 , not 0 29 (because 0 29 is really 0 28999999 ) signature number truncate(number $decimal, number $decimal places) arguments decimal the number to truncate decimal places the number of digits to keep after the decimal point example truncate(decimal col, `2`) tuple element transformation works on an element of a given array and applies the specified expression to that element the function returns an array or null if specified signature array tuple element transformation(array $array values, number $array index, expression $transformation function) arguments array values an array of values if the whole argument is null, the function returns null array index a number pointing to an element in the array the first index of the array is 0 transformation function a transformation functions to apply to array element specified by array index example tuple element transformation(\[ 1 0,2,3, '0', abs(@)) → \[1 0,2,3] ] unique return all unique values in an array signature array\[any] unique(array\[any] $vals) arguments vals array to return unique values from, removing all duplicates example unique(vals col) upper case returns upper case version of string signature string upper case(string $a) arguments a string to upper case example upper case("ocient") width bucket returns the bucket number of the queried value in a histogram starting at min, ending at max, and consisting of num buckets count of buckets the values are inclusive on the lower bound and exclusive of the upper bound so that all buckets are the same width also note that requesting 100 buckets will actually get you 102 possible buckets bucket 0 through 101 where 0 captures all values below the minimum and 101 captures all values at the maximum and beyond signature number width bucket(number $val, number $min, number max, number $num buckets) arguments val value to determine the bucket number for min min value of histogram max max value of histogram num buckets number of buckets in histogram example width bucket(my col, `0`, `100`, `20`) zip zips n one dimensional arrays into a single two dimensional array with each inner array containing n elements the resulting outer array will have a length equal to the max length of any input array for input arrays smaller than the max length, null elements will be placed in the inner arrays signature array\[any] zip(array\[any] $a0, array\[any] $a1, \[, array\[any] $…]) arguments a0 array to zip a1 array to zip … optional additional arrays to zip example zip(array col0, array col1, array col2) special functions explode performs an explosion on all indicated columns, producing one row per value in the exploded array this can be applied to multiple columns in the same table non exploded columns are held constant and exploded columns are zipped together in array order if multiple exploded columns have different array lengths, the record is exploded into n records where n is the max length columns with array lengths less than n will have null values after each of their values are exploded you can explode a maximum of 8,192 array elements for each source record usage explode(array\[any] arr) can be used like any other jmespath function on a json array type with a few limitations explode can only be used as the outermost function in a jmespath expression explode only works on a single dimension multidimensional explode is not supported examples single column explode transform configuration { "transform" { "topics" { "topic0" { "tables" { "table0" { "columns" { "col0" "explode(array0)", "col1" "integer0" } } } } } } } input { "array0" \[0, 1, 2, 3], "integer0" 4 } output \[ { "col0" 0, "col1" 4 }, { "col0" 1, "col1" 4 }, { "col0" 2, "col1" 4 }, { "col0" 3, "col1" 4 } ] multi column explode transform configuration { "transform" { "topics" { "topic0" { "tables" { "table0" { "columns" { "col0" "explode(array0)", "col1" "explode(array1)", "col2" "integer0" } } } } } } } input { "array0" \[0, 1, 2, 3], "array1" \[0, 1, 2], "integer0" 4 } output \[ { "col0" 0, "col1" 0, "col2" 4 }, { "col0" 1, "col1" 1, "col2" 4 }, { "col0" 2, "col1" 2, "col2" 4 }, { "col0" 3, "col1" null, "col2" 4 } ] note that in the 4th record col1 has a null value because array1 only has 3 values while array0 has 4 related links docid\ tt6tfoulap0mt aycm2ka