Loading Multiple Tables from a Topic
The LAT Load JSON Data from Kafka guide shows an example of loading each topic into its own table. However, each topic can route data to multiple tables and many topics can stream load simultaneously to one or more tables in a single pipeline. Here is an examplepipeline.json file that will load data into two separate tables:
JSON
- You are loading data from a Kafka topic called
products. - That topic is loading into a table called
database.schema.table. - Most columns are simply mapped from the source JSON document into, however some are transformed from the original data.
created_atis parsed from a string using theto_timestampfunctioncreated_atforproduct_ratingsis set dynamically to the value of thenow()function
Loading Nested JSON Data
As you can see from the transformation, the syntax is similar to SQL, but the inner language used to traverse the JSON document and manipulate arrays can be unfamiliar. The approach used to select a property from the source JSON is to follow the tree with dot notation. For example: Given a series of two documents like:JSON
JSON
JSON
JSON
||:
JSON
Loading Array Data
Where arrays are encountered, special array notation is used. The benefit of this syntax is that properties in objects in arrays can also be accessed to return an array. Another useful capability is the flattening of arrays. In the event that the values returned from a transformation includes nested arrays, you can flatten it to a single dimension array by appending[] to the end of the transformation expression. For example:
JSON
Exploding Array Data
In some cases, a single record from a loading source should represent more than one row in the target table. This is referred to as exploding the data. This allows nested array data to be expanded and to be associated with the corresponding values from the record. For example, if you would like to “explode” the data in the languages array into a single row per value to populate a standalone “country_languages” table, you could do the following for thetransform section of your pipeline.json file:
JSON
country from the record when the explode occurs:
Example Rows:
JSON
EXPLODE, they results will be “zipped” together. If the arrays are different lengths, the number of rows will match the longest array. The shorter array will populate the end of its rows with NULL. This is sometimes referred to as an “Outer Explode.”
Transforming and Joining String Data
One common requirement is to split a string on a token to make an array. This is accomplished using transformation functions in the LAT. Assuming a data input with a space separated string:JSON
tokenize function. Tokenize has a number of flags to control how the pattern should be interpreted. For example 'q' indicates the pattern is a literal and 'i' indicates case insensitive regular expression.
JSON
(?=(([^\\'\"]*[\\'\"]){2})*[^\\'\"]*$) that leverages a zero-width positive lookahead assertion. Note that because this regex will be presented in a JSON document, certain characters must be escaped within the JSON string. For example, in JSON:
JSON
Text
'\\s' would actually pass the regular expression \s to the tokenize function.
Similar to splitting on a token, the LAT can easily join an array into a string with the join function. With source data that includes an array like the following:
JSON
JSON

