LAT Reference
LAT Pipeline Configuration

LAT Sink Configuration

Data Pipelines are now the preferred method for loading data into the Ocient System. For details, see Load Data.

The Sink Configuration controls the destination of data in the LAT Pipeline.

sink

A sink configuration object.

Required keys:

sink.type

Type of sink to use in the pipeline.

Type:

string

Required:

Yes

Default:



Allowed values:

  • ocient: see Ocient Sink for additional configuration.
  • file: see File Sink for additional configuration.

Ocient Sink

The Sink allows LAT to connect to an Ocient cluster to write rows to one or more tables.

Required keys:

sink.remotes

Array of one or more Ocient Loader Nodes, in host:port,... format

Type:

string[]

Required:

Yes

Default:



sink.batch_records

Number of records to buffer per partition before flushing records to Ocient

Type:

int

Required:

No

Default:

1000

sink.batch_duration

Time based flushing parameter, in milliseconds. Records will flush to Ocient after this duration has elapsed with no new activity, even if fewer than batch_records records have been processed.

Type:

int

Required:

Yes

Default:

30000

sink.idle_partition_polling_period

Time based polling parameter, in milliseconds. This Sink will periodically poll the remote for progress on write durability for idle partitions.

Type:

int

Required:

No

Default:

60000

sink.request_timeout

Request timeout when communicating with Ocient remotes, in milliseconds.

Type:

int

Required:

No

Default:

300000

sink.request_backoff

Duration to delay after a failed request to an Ocient remote prior retrying, in milliseconds.

Type:

int

Required:

No

Default:

1000

sink.request_jitter

Additional duration to delay after a failed request to an Ocient remote prior retrying, in milliseconds. The total delay incurred prior to a given retry is request_backoff + rand(0, request_jitter).

Type:

int

Required:

No

Default:

5000

sink.high_watermark

High watermark memory point, in bytes. The LAT will stop pushing new rows to memory buffers. It will not resume pushing rows into the memory buffers until low_watermark is reached.

Type:

int

Required:

No

Default:

1000000000

sink.low_watermark

Low watermark memory point, in bytes. After reaching high_watermark, the LAT will begin pushing rows to memory buffers again when this memory level is reached.

Type:

int

Required:

No

Default:

500000000

sink.storage_scope_id

UUID of the storage scope that rows will be associated with. The scope with the given UUID must already exist in the target cluster.

Type:

string

Required:

No

Default:

null

sink.skip_page_replication

A Boolean value to determine whether to omit page replicas for the specified storage scope. This is ignored if sink.storage_scope_id is not specified or has already been seen by the remotes.

Type:

boolean

Required:

No

Default:

false

sink.netty_event_loop_group_threads

The number of threads in the Netty event loop group used to communicate with remotes.

Type:

int

Required:

No

Default:

1

Example Ocient Sink Configuration

JSON


File Sink

A Sink type for testing LAT pipelines that writes the transformed data to local JSONL files.

Required keys:

sink.location

An absolute or relative path to the location that the sink should write files to.

Type:

String

Required:

Yes

Default:



Example File Sink Configuration

JSON


External Sink Configuration

Rather than including a sink directly within the pipeline, it is also possible to configure a pipeline to use a sink that is specified externally. Sinks can be managed (created, deleted, and more) using the LAT Client Command Line Interface. A sink must exist before a pipeline can use it.

There are three ways to configure a pipeline to use a sink.

  1. If a sink is included directly within a pipeline (using the LAT Sink Configuration), it will be used.
  2. If a sink is not specified within the pipeline, you can specify a sink_name that corresponds to a sink previously created using the LAT Client.
  3. If neither sink nor sink_name is specified in a pipeline, the default sink will be used. If a default sink has not been created using the LAT Client, a pipeline must specify either a sink or a sink_name.

Related Links