Loading and Transformation Ref...
Pipeline Configuration

Sink Configuration

The Sink Configuration controls the destination of data in the LAT Pipeline.

sink

A sink configuration object.

Required keys:

sink.type

Type of sink to use in the pipeline.

Type:

string

Required:

Yes

Default:



Allowed values:

  • ocient: see Ocient Sink for additional configuration.
  • file: see File Sink for additional configuration.

Ocient Sink

The Sink allows LAT to connect to an Ocient cluster to write rows to one or more tables.

Required keys:

sink.remotes

Array of one or more Ocient Loader Nodes, in host:port,... format

Type:

string[]

Required:

Yes

Default:



sink.batch_records

Number of records to buffer per partition before flushing records to Ocient

Type:

int

Required:

No

Default:

1000

sink.batch_duration

Time based flushing parameter, in milliseconds. Records will flush to Ocient after this duration has elapsed with no new activity, even if fewer than batch_records records have been processed.

Type:

int

Required:

Yes

Default:

30000

sink.idle_partition_polling_period

Time based polling parameter, in milliseconds. This Sink will periodically poll the remote for progress on write durability for idle partitions.

Type:

int

Required:

No

Default:

60000

sink.request_timeout

Request timeout when communicating with Ocient remotes, in milliseconds.

Type:

int

Required:

No

Default:

300000

sink.request_backoff

Duration to delay after a failed request to an Ocient remote prior retrying, in milliseconds.

Type:

int

Required:

No

Default:

1000

sink.request_jitter

Additional duration to delay after a failed request to an Ocient remote prior retrying, in milliseconds. The total delay incurred prior to a given retry is request_backoff + rand(0, request_jitter).

Type:

int

Required:

No

Default:

5000

sink.high_watermark

High watermark memory point, in bytes. The LAT will stop pushing new rows to memory buffers. It will not resume pushing rows into the memory buffers until low_watermark is reached.

Type:

int

Required:

No

Default:

1000000000

sink.low_watermark

Low watermark memory point, in bytes. After reaching high_watermark, the LAT will begin pushing rows to memory buffers again when this memory level is reached.

Type:

int

Required:

No

Default:

500000000

sink.storage_scope_id

UUID of the storage scope that rows will be associated with. The scope with the given UUID must already exist in the target cluster.

Type:

string

Required:

No

Default:

null

sink.skip_page_replication

A Boolean value to determine whether to omit page replicas for the specified storage scope. This is ignored if sink.storage_scope_id is not specified or has already been seen by the remotes.

Type:

boolean

Required:

No

Default:

false

sink.netty_event_loop_group_threads

The number of threads in the Netty event loop group used to communicate with remotes.

Type:

int

Required:

No

Default:

1

Example Ocient Sink Configuration

JSON


File Sink

A Sink type for testing LAT pipelines that writes the transformed data to local JSONL files.

Required keys:

sink.location

An absolute or relative path to the location that the sink should write files to.

Type:

String

Required:

Yes

Default:



Example File Sink Configuration

JSON


External Sink Configuration

Rather than including a sink directly within the pipeline, it is also possible to configure a pipeline to use a sink that is specified externally. Sinks can be managed (created, deleted, and more) using the LAT Client Command Line Interface. A sink must exist before a pipeline can use it.

There are three ways to configure a pipeline to use a sink.

  1. If a sink is included directly within a pipeline (using the Sink Configuration), it will be used.
  2. If a sink is not specified within the pipeline, you can specify a sink_name that corresponds to a sink previously created using the LAT Client.
  3. If neither sink nor sink_name is specified in a pipeline, the default sink will be used. If a default sink has not been created using the LAT Client, a pipeline must specify either a sink or a sink_name.

Related Links

Load Data