System Administration
Loading Administration

Configuration Settings for Data Pipelines

Data pipeline functionality enables the use of SQL statements to load data. You can use a variety of configuration settings to manage the load. See this table for the settings that you can use.

Configuration Setting Parameter Name

Default

Data Type

Description

streamloader.extractorEngineParameters.enabled

true

BOOLEAN

Whether the Loader Node attempts to bootstrap the extractor process for data pipelines on start-up. If you set this parameter to true, the Loader Node does not become active until the Java loading process has successfully started.

streamloader.extractorEngineParameters.jvmMemoryConfiguration

  • 32m for initialHeap
  • 64g for maxHeap
  • 64g for maxDirect

VARCHAR

The memory configuration to use for the extractor Java process. Use strings representing data sizes in SI format: 1 KiB, 5 MiB, etc.

You must set the three memory options in the same statement.

Example

ALTER SYSTEM ALTER CONFIG SET 'streamloader.extractorEngineParameters.jvmMemoryConfiguration.initialHeap' = '64g', 'streamloader.extractorEngineParameters.jvmMemoryConfiguration.maxHeap' = '64g', 'streamloader.extractorEngineParameters.jvmMemoryConfiguration.maxDirect' = '64g';

streamloader.extractorEngineParameters.restPort

8090

INT

Use this parameter with the streamloader.extractorEngineParameters.portOffset parameter to form the REST port for the extractor. (External monitoring services like can query the REST server for loading process metrics.)

streamloader.extractorEngineParameters.portOffset

0

INT

Use this parameter with the streamloader.extractorEngineParameters.restPort parameter to form the REST port of the extractor. (External monitoring services like Telegraf can query the REST server for loading process metrics.)

streamloader.extractorEngineParameters.configurationOption.expect.empty.file.list

false

BOOLEAN

Set this value to false to fail a file source data pipeline that finds no files to load instead of the load completing successfully with no data loaded.

streamloader.extractorEngineParameters.configurationOption.pipeline.preview.rows.limit

1000

INT

The maximum number of rows a PREVIEW PIPELINE SQL statement can return. For example, when you use this statement, if the specified LIMIT value exceeds this number, then the loading process throws an error.

streamloader.extractorEngineParameters.configurationOption.engine.transform.udt.jarRootDirectory

/opt/ocient/current/lib/extractorengine_udt

VARCHAR

The absolute path to the directory containing JARs for data pipeline functions. To install and enable a third-party library for using data pipeline functions, install the JAR package at this location on all Loader Nodes. Then, add the chosen fully-qualified class name to the IMPORT clause of the data pipeline function. For details, see CREATE OR REPLACE PIPELINE FUNCTION.

streamloader.extractorEngineParameters.configurationOption.source.record.max.size

Dynamically calculated value at start-up based on the number of processors and amount of memory available to the JVM

BIGINT

The maximum source record size, in bytes, that the loading process tolerates before throwing an error.

streamloader.extractorEngineParameters.configurationOption.s3.region

us-east-1

VARCHAR

The default region for S3 file sources.

streamloader.extractorEngineParameters.configurationOption.s3.force.path.style.access

false

BOOLEAN

Set this value to true to use path-style access for all S3 file sources. The default is a virtual-hosted style.

streamloader.extractorEngineParameters.configurationOption.s3.force.path.style.access.if.endpoint.overriden

true

BOOLEAN

Set this value to true to use path-style access for S3 file sources that specify endpoints. (The system still uses virtual-hosted style access for sources without endpoint overrides unless you set streamloader.extractorEngineParameters.configurationOption.s3.force.path.style.access to true).

streamloader.extractorEngineParameters.configurationOption.s3.retries.count

10

INT

The number of retries for the S3 file source client.

streamloader.extractorEngineParameters.configurationOption.s3.netty.read.timeout.seconds

0

INT

The timeout value for read operations, in seconds, for the S3 file source client. 0 means do not perform a timeout.

streamloader.extractorEngineParameters.configurationOption.s3.netty.max.concurrency

50

INT

Maximum number of allowed concurrent requests for the S3 file source client.

streamloader.extractorEngineParameters.configurationOption.awssdk.backoff.strategy.base.delay.seconds

1

INT

The base amount of time, in seconds, for calculating the time the S3 file source client waits before retrying a failed request. The range of values that the calculated time can achieve increases exponentially with each failure. You can set the maximum value using the streamloader.extractorEngineParameters.configurationOption.awssdk.backoff.strategy.max.backoff.seconds parameter.

streamloader.extractorEngineParameters.configurationOption.awssdk.backoff.strategy.max.backoff.seconds

20

INT

The maximum amount of time the S3 file source client waits before retrying a failed request.

streamloader.extractorEngineParameters.configurationOption.awssdk.max.pending.connection.acquires

20000

INT

The maximum number of pending connection acquires that the S3 file source client allows.

streamloader.extractorEngineParameters.configurationOption.awssdk.connection.timeout.seconds

10

INT

The amount of time, in seconds, that the S3 file source client waits when initially establishing a connection.

streamloader.extractorEngineParameters.configurationOption.awssdk.connection.max.idle.timeout.seconds

300

INT

The maximum amount of time, in seconds, that the system allows a connection established by the S3 file source client to remain open while idle.

streamloader.extractorEngineParameters.configurationOption.awssdk.connection.acquisition.timeout.seconds

300

INT

The amount of time, in seconds, that the S3 file source client waits when acquiring a connection from the pool.

streamloader.extractorEngineParameters.configurationOption.kafka.poll.timeout.msec

1000

BIGINT

The amount of time, in milliseconds, that a consumer for a pipeline with an source waits for records to become available during a poll operation.

streamloader.extractorEngineParameters.configurationOption.kafka.assign.timeout.msec

20000

BIGINT

The amount of time, in milliseconds, that a PREVIEW PIPELINE SQL statement waits for partitions to be assigned to the consumer on a pipeline with a Kafka source. If the system does not assign any partitions when the timeout elapses, then the PREVIEW PIPELINE statement terminates without data processing.

streamloader.extractorEngineParameters.configurationOption.kafka.activate.timeout.msec

5000

BIGINT

The amount of time, in milliseconds, that a PREVIEW PIPELINE SQL statement waits for a partition to receive records on a pipeline with a Kafka source.

streamloader.extractorEngineParameters.configurationOption.arrow.max.native.memory.usage.bytes

23756537856

BIGINT

The default maximum native memory usage, in bytes, for the reader.

streamloader.extractorEngineParameters.configurationOption.arrow.max.java.buffer.memory.usage.bytes

15837691904

BIGINT

The default maximum buffer memory usage, in bytes, for the Parquet reader.

Related Links

Load Data