Configuration Settings for Data Pipelines
Data pipeline functionality enables the use of SQL statements to load data. You can use a variety of configuration settings to manage the load. See this table for the settings that you can use.
Configuration Setting Parameter Name | Default | Data Type | Description |
---|---|---|---|
streamloader.extractorEngineParameters.enabled | true | BOOLEAN | Whether the Loader Node attempts to bootstrap the extractor process for data pipelines on start-up. If you set this parameter to true, the Loader Node does not become active until the Java loading process has successfully started. |
streamloader.extractorEngineParameters.jvmMemoryConfiguration |
| VARCHAR | The memory configuration to use for the extractor Java process. Use strings representing data sizes in SI format: 1 KiB, 5 MiB, etc. You must set the three memory options in the same statement. Example ALTER SYSTEM ALTER CONFIG SET 'streamloader.extractorEngineParameters.jvmMemoryConfiguration.initialHeap' = '64g', 'streamloader.extractorEngineParameters.jvmMemoryConfiguration.maxHeap' = '64g', 'streamloader.extractorEngineParameters.jvmMemoryConfiguration.maxDirect' = '64g'; |
streamloader.extractorEngineParameters.restPort | 8090 | INT | Use this parameter with the streamloader.extractorEngineParameters.portOffset parameter to form the REST port for the extractor. (External monitoring services like can query the REST server for loading process metrics.) |
streamloader.extractorEngineParameters.portOffset | 0 | INT | Use this parameter with the streamloader.extractorEngineParameters.restPort parameter to form the REST port of the extractor. (External monitoring services like Telegraf can query the REST server for loading process metrics.) |
streamloader.extractorEngineParameters.configurationOption.expect.empty.file.list | false | BOOLEAN | Set this value to false to fail a file source data pipeline that finds no files to load instead of the load completing successfully with no data loaded. |
streamloader.extractorEngineParameters.configurationOption.pipeline.preview.rows.limit | 1000 | INT | The maximum number of rows a PREVIEW PIPELINE SQL statement can return. For example, when you use this statement, if the specified LIMIT value exceeds this number, then the loading process throws an error. |
streamloader.extractorEngineParameters.configurationOption.engine.transform.udt.jarRootDirectory | /opt/ocient/current/lib/extractorengine_udt | VARCHAR | The absolute path to the directory containing JARs for data pipeline functions. To install and enable a third-party library for using data pipeline functions, install the JAR package at this location on all Loader Nodes. Then, add the chosen fully-qualified class name to the IMPORT clause of the data pipeline function. For details, see CREATE OR REPLACE PIPELINE FUNCTION. |
streamloader.extractorEngineParameters.configurationOption.source.record.max.size | Dynamically calculated value at start-up based on the number of processors and amount of memory available to the JVM | BIGINT | The maximum source record size, in bytes, that the loading process tolerates before throwing an error. |
streamloader.extractorEngineParameters.configurationOption.s3.region | us-east-1 | VARCHAR | The default region for S3 file sources. |
streamloader.extractorEngineParameters.configurationOption.s3.force.path.style.access | false | BOOLEAN | Set this value to true to use path-style access for all S3 file sources. The default is a virtual-hosted style. |
streamloader.extractorEngineParameters.configurationOption.s3.force.path.style.access.if.endpoint.overriden | true | BOOLEAN | Set this value to true to use path-style access for S3 file sources that specify endpoints. (The system still uses virtual-hosted style access for sources without endpoint overrides unless you set streamloader.extractorEngineParameters.configurationOption.s3.force.path.style.access to true). |
streamloader.extractorEngineParameters.configurationOption.s3.retries.count | 10 | INT | The number of retries for the S3 file source client. |
streamloader.extractorEngineParameters.configurationOption.s3.netty.read.timeout.seconds | 0 | INT | The timeout value for read operations, in seconds, for the S3 file source client. 0 means do not perform a timeout. |
streamloader.extractorEngineParameters.configurationOption.s3.netty.max.concurrency | 50 | INT | Maximum number of allowed concurrent requests for the S3 file source client. |
streamloader.extractorEngineParameters.configurationOption.awssdk.backoff.strategy.base.delay.seconds | 1 | INT | The base amount of time, in seconds, for calculating the time the S3 file source client waits before retrying a failed request. The range of values that the calculated time can achieve increases exponentially with each failure. You can set the maximum value using the streamloader.extractorEngineParameters.configurationOption.awssdk.backoff.strategy.max.backoff.seconds parameter. |
streamloader.extractorEngineParameters.configurationOption.awssdk.backoff.strategy.max.backoff.seconds | 20 | INT | The maximum amount of time the S3 file source client waits before retrying a failed request. |
streamloader.extractorEngineParameters.configurationOption.awssdk.max.pending.connection.acquires | 20000 | INT | The maximum number of pending connection acquires that the S3 file source client allows. |
streamloader.extractorEngineParameters.configurationOption.awssdk.connection.timeout.seconds | 10 | INT | The amount of time, in seconds, that the S3 file source client waits when initially establishing a connection. |
streamloader.extractorEngineParameters.configurationOption.awssdk.connection.max.idle.timeout.seconds | 300 | INT | The maximum amount of time, in seconds, that the system allows a connection established by the S3 file source client to remain open while idle. |
streamloader.extractorEngineParameters.configurationOption.awssdk.connection.acquisition.timeout.seconds | 300 | INT | The amount of time, in seconds, that the S3 file source client waits when acquiring a connection from the pool. |
streamloader.extractorEngineParameters.configurationOption.kafka.poll.timeout.msec | 1000 | BIGINT | The amount of time, in milliseconds, that a consumer for a pipeline with an source waits for records to become available during a poll operation. |
streamloader.extractorEngineParameters.configurationOption.kafka.assign.timeout.msec | 20000 | BIGINT | The amount of time, in milliseconds, that a PREVIEW PIPELINE SQL statement waits for partitions to be assigned to the consumer on a pipeline with a Kafka source. If the system does not assign any partitions when the timeout elapses, then the PREVIEW PIPELINE statement terminates without data processing. |
streamloader.extractorEngineParameters.configurationOption.kafka.activate.timeout.msec | 5000 | BIGINT | The amount of time, in milliseconds, that a PREVIEW PIPELINE SQL statement waits for a partition to receive records on a pipeline with a Kafka source. |
streamloader.extractorEngineParameters.configurationOption.arrow.max.native.memory.usage.bytes | 23756537856 | BIGINT | The default maximum native memory usage, in bytes, for the reader. |
streamloader.extractorEngineParameters.configurationOption.arrow.max.java.buffer.memory.usage.bytes | 15837691904 | BIGINT | The default maximum buffer memory usage, in bytes, for the Parquet reader. |
Load Data