LAT Sink Configuration
Data Pipelines are now the preferred method for loading data into the Ocient System. For details, see Load Data.
The Sink Configuration controls the destination of data in the LAT Pipeline.
A sink configuration object.
Required keys:
Type of sink to use in the pipeline.
Type: | string |
---|---|
Required: | Yes |
Default: | |
Allowed values:
The Sink allows LAT to connect to an Ocient cluster to write rows to one or more tables.
Required keys:
Array of one or more Ocient Loader Nodes, in host:port,... format
Type: | string[] |
---|---|
Required: | Yes |
Default: | |
Number of records to buffer per partition before flushing records to Ocient
Type: | int |
---|---|
Required: | No |
Default: | 1000 |
Time based flushing parameter, in milliseconds. Records will flush to Ocient after this duration has elapsed with no new activity, even if fewer than batch_records records have been processed.
Type: | int |
---|---|
Required: | Yes |
Default: | 30000 |
Time based polling parameter, in milliseconds. This Sink will periodically poll the remote for progress on write durability for idle partitions.
Type: | int |
---|---|
Required: | No |
Default: | 60000 |
Request timeout when communicating with Ocient remotes, in milliseconds.
Type: | int |
---|---|
Required: | No |
Default: | 300000 |
Duration to delay after a failed request to an Ocient remote prior retrying, in milliseconds.
Type: | int |
---|---|
Required: | No |
Default: | 1000 |
Additional duration to delay after a failed request to an Ocient remote prior retrying, in milliseconds. The total delay incurred prior to a given retry is request_backoff + rand(0, request_jitter).
Type: | int |
---|---|
Required: | No |
Default: | 5000 |
High watermark memory point, in bytes. The LAT will stop pushing new rows to memory buffers. It will not resume pushing rows into the memory buffers until low_watermark is reached.
Type: | int |
---|---|
Required: | No |
Default: | 1000000000 |
Low watermark memory point, in bytes. After reaching high_watermark, the LAT will begin pushing rows to memory buffers again when this memory level is reached.
Type: | int |
---|---|
Required: | No |
Default: | 500000000 |
UUID of the storage scope that rows will be associated with. The scope with the given UUID must already exist in the target cluster.
Type: | string |
---|---|
Required: | No |
Default: | null |
A Boolean value to determine whether to omit page replicas for the specified storage scope. This is ignored if sink.storage_scope_id is not specified or has already been seen by the remotes.
Type: | boolean |
---|---|
Required: | No |
Default: | false |
The number of threads in the Netty event loop group used to communicate with remotes.
Type: | int |
---|---|
Required: | No |
Default: | 1 |
A Sink type for testing LAT pipelines that writes the transformed data to local JSONL files.
Required keys:
An absolute or relative path to the location that the sink should write files to.
Type: | String |
---|---|
Required: | Yes |
Default: | |
Rather than including a sink directly within the pipeline, it is also possible to configure a pipeline to use a sink that is specified externally. Sinks can be managed (created, deleted, and more) using the LAT Client Command Line Interface. A sink must exist before a pipeline can use it.
There are three ways to configure a pipeline to use a sink.
- If a sink is included directly within a pipeline (using the LAT Sink Configuration), it will be used.
- If a sink is not specified within the pipeline, you can specify a sink_name that corresponds to a sink previously created using the LAT Client.
- If neither sink nor sink_name is specified in a pipeline, the default sink will be used. If a default sink has not been created using the LAT Client, a pipeline must specify either a sink or a sink_name.