> ## Documentation Index
> Fetch the complete documentation index at: https://docs.ocient.com/llms.txt
> Use this file to discover all available pages before exploring further.

# Configuration Settings for Data Pipelines

export const Telegraf = "Telegraf™";

export const Parquet = "Apache® Parquet™";

export const Kafka = "Apache® Kafka®";

export const JVM = "JVM®";

export const Java = "Java®";

export const AWS = "Amazon® Web Services℠ (AWS℠)";

Data pipeline functionality enables the use of SQL statements to load data. You can use a variety of configuration settings to manage the load. See this table for the settings that you can use.

| **Configuration Setting Parameter Name**                                                                      | **Default**                                                                                                          | **Data Type** | **Description**                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   |
| ------------------------------------------------------------------------------------------------------------- | -------------------------------------------------------------------------------------------------------------------- | ------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `streamloader.extractorEngineParameters.enabled`                                                              | `true`                                                                                                               | BOOLEAN       | Whether the Loader Node attempts to bootstrap the extractor {Java} process for data pipelines on start-up. If you set this parameter to `true`, the Loader Node does not become active until the Java loading process has successfully started.                                                                                                                                                                                                                                                                                                   |
| `streamloader.extractorEngineParameters.jvmMemoryConfiguration`                                               | - `32g` for `initialHeap`<br />- `64g` for `maxHeap`<br />- `64g` for `maxDirect`                                    | VARCHAR       | The {JVM}  memory configuration to use for the extractor Java process. Use strings representing data sizes in SI format: 1 KiB, 5 MiB, etc.<br />You must set the three memory options in the same statement. <br /><br />**Example**<br />`ALTER SYSTEM ALTER CONFIG SET 'streamloader.extractorEngineParameters.jvmMemoryConfiguration.initialHeap' = '32g',`<br />`'streamloader.extractorEngineParameters.jvmMemoryConfiguration.maxHeap' = '64g',`<br />`'streamloader.extractorEngineParameters.jvmMemoryConfiguration.maxDirect' = '64g';` |
| `streamloader.extractorEngineParameters.restPort`                                                             | 8090                                                                                                                 | INT           | Use this parameter with the `streamloader.extractorEngineParameters.portOffset` parameter to form the REST port for the extractor. (External monitoring services like {Telegraf} can query the REST server for loading process metrics.)                                                                                                                                                                                                                                                                                                          |
| `streamloader.extractorEngineParameters.portOffset`                                                           | 0                                                                                                                    | INT           | Use this parameter with the `streamloader.extractorEngineParameters.restPort` parameter to form the REST port of the extractor. (External monitoring services like Telegraf can query the REST server for loading process metrics.)                                                                                                                                                                                                                                                                                                               |
| `streamloader.extractorEngineParameters.configurationOption.expect.empty.file.list`                           | `false`                                                                                                              | BOOLEAN       | Set this value to `false` to fail a file source data pipeline that finds no files to load instead of the load completing successfully with no data loaded.                                                                                                                                                                                                                                                                                                                                                                                        |
| `streamloader.extractorEngineParameters.configurationOption.pipeline.preview.rows.limit`                      | 1000                                                                                                                 | INT           | The maximum number of rows a `PREVIEW PIPELINE` SQL statement can return. For example, when you use this statement, if the specified `LIMIT` value exceeds this number, then the loading process throws an error.                                                                                                                                                                                                                                                                                                                                 |
| `streamloader.extractorEngineParameters.configurationOption.engine.transform.udt.jarRootDirectory`            | `/opt/ocient/current/lib/extractorengine_udt`                                                                        | VARCHAR       | The absolute path to the directory containing JARs for data pipeline functions. To install and enable a third-party library for using data pipeline functions, install the JAR package at this location on all Loader Nodes. Then, add the chosen fully-qualified class name to the IMPORT clause of the data pipeline function. For details, see [CREATE OR REPLACE PIPELINE FUNCTION](/data-pipelines#create-pipeline-function).                                                                                                                |
| `streamloader.extractorEngineParameters.configurationOption.engine.external.jdbc.jarRootDirectory`            | `/opt/ocient/current/lib/extractorengine_jdbc`                                                                       | VARCHAR       | The absolute path to the directory containing JARs for external source lookup functionality. For details, see [Load Data from External Sources in Data Pipelines](/load-data-from-external-sources-in-data-pipelines).                                                                                                                                                                                                                                                                                                                            |
| `streamloader.extractorEngineParameters.configurationOption.source.record.max.size`                           | Dynamically calculated value at start-up based on the number of processors and amount of memory available to the JVM | BIGINT        | The maximum source record size, in bytes, that the loading process tolerates before throwing an error.                                                                                                                                                                                                                                                                                                                                                                                                                                            |
| `streamloader.extractorEngineParameters.configurationOption.s3.region`                                        | `us-east-1`                                                                                                          | VARCHAR       | The default region for {AWS} S3 file sources.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     |
| `streamloader.extractorEngineParameters.configurationOption.s3.force.path.style.access`                       | `false`                                                                                                              | BOOLEAN       | Set this value to `true` to use path-style access for all S3 file sources. The default is a virtual-hosted style.                                                                                                                                                                                                                                                                                                                                                                                                                                 |
| `streamloader.extractorEngineParameters.configurationOption.s3.force.path.style.access.if.endpoint.overriden` | `true`                                                                                                               | BOOLEAN       | Set this value to `true` to use path-style access for S3 file sources that specify endpoints. (The system still uses virtual-hosted style access for sources without endpoint overrides unless you set `streamloader.extractorEngineParameters.configurationOption.s3.force.path.style.access` to `true`).                                                                                                                                                                                                                                        |
| `streamloader.extractorEngineParameters.configurationOption.s3.retries.count`                                 | 10                                                                                                                   | INT           | The number of retries for the S3 file source client.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              |
| `streamloader.extractorEngineParameters.configurationOption.s3.netty.read.timeout.seconds`                    | 0                                                                                                                    | INT           | The timeout value for read operations, in seconds, for the S3 file source client. `0` means do not perform a timeout.                                                                                                                                                                                                                                                                                                                                                                                                                             |
| `streamloader.extractorEngineParameters.configurationOption.s3.netty.max.concurrency`                         | 50                                                                                                                   | INT           | Maximum number of allowed concurrent requests for the S3 file source client.                                                                                                                                                                                                                                                                                                                                                                                                                                                                      |
| `streamloader.extractorEngineParameters.configurationOption.awssdk.backoff.strategy.base.delay.seconds`       | 1                                                                                                                    | INT           | The base amount of time, in seconds, for calculating the time the S3 file source client waits before retrying a failed request. The range of values that the calculated time can achieve increases exponentially with each failure. You can set the maximum value using the `streamloader.extractorEngineParameters.configurationOption.awssdk.backoff.strategy.max.backoff.seconds` parameter.                                                                                                                                                   |
| `streamloader.extractorEngineParameters.configurationOption.awssdk.backoff.strategy.max.backoff.seconds`      | 20                                                                                                                   | INT           | The maximum amount of time the S3 file source client waits before retrying a failed request.                                                                                                                                                                                                                                                                                                                                                                                                                                                      |
| `streamloader.extractorEngineParameters.configurationOption.awssdk.max.pending.connection.acquires`           | 20000                                                                                                                | INT           | The maximum number of pending connection acquires that the S3 file source client allows.                                                                                                                                                                                                                                                                                                                                                                                                                                                          |
| `streamloader.extractorEngineParameters.configurationOption.awssdk.connection.timeout.seconds`                | 10                                                                                                                   | INT           | The amount of time, in seconds, that the S3 file source client waits when initially establishing a connection.                                                                                                                                                                                                                                                                                                                                                                                                                                    |
| `streamloader.extractorEngineParameters.configurationOption.awssdk.connection.max.idle.timeout.seconds`       | 300                                                                                                                  | INT           | The maximum amount of time, in seconds, that the system allows a connection established by the S3 file source client to remain open while idle.                                                                                                                                                                                                                                                                                                                                                                                                   |
| `streamloader.extractorEngineParameters.configurationOption.awssdk.connection.acquisition.timeout.seconds`    | 300                                                                                                                  | INT           | The amount of time, in seconds, that the S3 file source client waits when acquiring a connection from the pool.                                                                                                                                                                                                                                                                                                                                                                                                                                   |
| `streamloader.extractorEngineParameters.configurationOption.kafka.poll.timeout.msec`                          | 1000                                                                                                                 | BIGINT        | The amount of time, in milliseconds, that a consumer for a pipeline with an {Kafka} source waits for records to become available during a poll operation.                                                                                                                                                                                                                                                                                                                                                                                         |
| `streamloader.extractorEngineParameters.configurationOption.kafka.assign.timeout.msec`                        | 20000                                                                                                                | BIGINT        | The amount of time, in milliseconds, that a `PREVIEW PIPELINE` SQL statement waits for partitions to be assigned to the consumer on a pipeline with a Kafka source. If the system does not assign any partitions when the timeout elapses, then the `PREVIEW PIPELINE` statement terminates without data processing.                                                                                                                                                                                                                              |
| `streamloader.extractorEngineParameters.configurationOption.kafka.activate.timeout.msec`                      | 5000                                                                                                                 | BIGINT        | The amount of time, in milliseconds, that a `PREVIEW PIPELINE` SQL statement waits for a partition to receive records on a pipeline with a Kafka source.                                                                                                                                                                                                                                                                                                                                                                                          |
| `streamloader.extractorEngineParameters.configurationOption.arrow.max.native.memory.usage.bytes`              | 23756537856                                                                                                          | BIGINT        | The default maximum native memory usage, in bytes, for the {Parquet} reader.                                                                                                                                                                                                                                                                                                                                                                                                                                                                      |
| `streamloader.extractorEngineParameters.configurationOption.arrow.max.java.buffer.memory.usage.bytes`         | 15837691904                                                                                                          | BIGINT        | The default maximum buffer memory usage, in bytes, for the Parquet reader.                                                                                                                                                                                                                                                                                                                                                                                                                                                                        |
| `streamloader.extractorEngineParameters.configurationOption.filesystem.access.directories`                    | `/tmp`                                                                                                               | VARCHAR       | Comma-separated list of directories where you can load server file system data.<br /><br />Each directory must exist.<br /><br />All files must be located within one of the directories.<br /><br />By default, you can load from the `/tmp` directory. <br />                                                                                                                                                                                                                                                                                   |

## Related Links

[Load Data](/load-data)

[Alter Default Data Pipeline Behavior](/alter-default-data-pipeline-behavior)
