LAT Reference

LAT Pipeline Configuration

data pipelines are now the preferred method for loading data into the ocient system for details, see docid\ zncvnrhsf6fg1yvqk6mxt a pipeline configuration is a json file that fully describes the necessary elements to run a pipeline, including source the source location from which a pipeline should read records to process extract how a pipeline should extract data from the source transform how a pipeline should transform incoming records sink the destination where a pipeline should write transformed rows the json file is a list of keys with (possibly nested) values the available key value pairs are documented in these sections pipeline top level configuration for a pipeline required keys docid 37u9m tt6h6wmshsx7zd docid 37u9m tt6h6wmshsx7zd docid 37u9m tt6h6wmshsx7zd docid 37u9m tt6h6wmshsx7zd the following is an example of the structure of a pipeline configuration { "version" 2, "source" { "type" "kafka" // kafka source configuration }, "sink" { "type" "ocient" // ocient sink configuration }, "extract" { // extract configuration }, "transform" { "topics" { "my topic" { "tables" { "my table" { "columns" { "my col1" "record field 1", "my col2" "record field 2" } } } } } } } configuration version the pipeline’s version required value is 2 372,372 left unhandled content type left unhandled content type left unhandled content type left unhandled content type left unhandled content type left unhandled content type pipeline id a unique identifier for this pipeline allowed characters are a z , a z , 0 9 , , and the pipeline id is used to uniquely identify a pipeline it is used for a few purposes deduplication scope see docid 8uxqvvqf bjb47xxr1y14 for {{kafka}} loads, the consumer group id is set to ocient lat \[pipeline id] for most loads from file sources, it is advisable to leave the pipeline id unset when creating a pipeline using the lat client the client will assign the pipeline a random uuid left unhandled content type left unhandled content type left unhandled content type left unhandled content type left unhandled content type left unhandled content type workers the number of workers this pipeline should use for processing records 0 left unhandled content type left unhandled content type left unhandled content type left unhandled content type left unhandled content type left unhandled content type log original records add original records to the error log when errors occur when this setting is set to true , lat writes data extracted from the source to the error log and in some error messages by default, this setting is false and source data is not written to the error log nor included in error messages in order to enable this setting, the docid\ i3vh2paf6qwnruwsrem5p service configuration must also be enabled this configuration only affects pipelines that do not use an docid 37u9m tt6h6wmshsx7zd seek on rebalance whether to seek a newly assigned partition to the latest known durable record prior to resuming processing disabling this behavior should typically be reserved for test scenarios and is only supported for kafka loading 0 left unhandled content type left unhandled content type left unhandled content type left unhandled content type left unhandled content type left unhandled content type continue on unrecoverable error whether to allow workers to continue processing when they encounter an ordinarily unrecoverable error unhandled content type unhandled content type unhandled content type unhandled content type unhandled content type unhandled content type single file mode enable the single file mode this mode is designed for a specific use case where there are few files but each file size is large when using this mode, only a single file is processed at a time, so docid\ smfdm8w8ttko0w2ftpbwu must be equal to 1 and only one docid\ smfdm8w8ttko0w2ftpbwu can be defined the single file is processed in parallel by the number of workers defined by the pipeline workers setting there is no need to enable this in common use cases left unhandled content type left unhandled content type left unhandled content type left unhandled content type left unhandled content type left unhandled content type a known limitation exists with the lat metrics when you use the single file mode metrics returned from the lat client pipeline status command might not display the expected count of files processed, processing, and so on however, the load still displays the correct processing and completed statuses error topic a kafka topic to write records which cannot be processed if absent, error records will be logged to the error log file without additional processing this configuration is only available if a kafka source is configured for the pipeline the configuration for that source will apply to the kafka producer for this topic 0 left unhandled content type left unhandled content type left unhandled content type left unhandled content type left unhandled content type left unhandled content type polling duration maximum duration to block while polling for new records from a source, in milliseconds 0 left unhandled content type left unhandled content type left unhandled content type left unhandled content type left unhandled content type left unhandled content type source source configuration section see docid\ smfdm8w8ttko0w2ftpbwu for nested configuration 0 left unhandled content type left unhandled content type left unhandled content type left unhandled content type left unhandled content type left unhandled content type sink sink configuration section see docid\ tqopqz139pn8ti995a31m for inline configuration details sink can not be set if sink name is set sink can be omitted if a default sink is defined as an docid\ tqopqz139pn8ti995a31m 0 left unhandled content type left unhandled content type left unhandled content type left unhandled content type left unhandled content type left unhandled content type sink name the name of an externally configured sink sink name can not be set if docid 37u9m tt6h6wmshsx7zd is set see docid\ tqopqz139pn8ti995a31m sink name can be omitted if a default sink is defined as an docid\ tqopqz139pn8ti995a31m 0 left unhandled content type left unhandled content type left unhandled content type left unhandled content type left unhandled content type left unhandled content type extract extract configuration section see docid\ p makey5cmtzhn iw2zon for nested configuration left unhandled content type left unhandled content type left unhandled content type left unhandled content type left unhandled content type left unhandled content type transform transform configuration section see docid\ xk acu9i3s5tqrqgrz26v for nested configuration left unhandled content type left unhandled content type left unhandled content type left unhandled content type left unhandled content type left unhandled content type related links docid\ zjlpeecj74 lp0ciflqh0