Configuration Settings for Data Pipelines

data pipeline functionality enables the use of sql statements to load data you can use a variety of configuration settings to manage the load see this table for the settings that you can use configuration setting parameter name default data type description streamloader extractorengineparameters enabled true boolean whether the loader node attempts to bootstrap the extractor {{java}} process for data pipelines on start up if you set this parameter to true , the loader node does not become active until the java loading process has successfully started streamloader extractorengineparameters jvmmemoryconfiguration 32m for initialheap 64g for maxheap 64g for maxdirect varchar the {{jvm}} memory configuration to use for the extractor java process use strings representing data sizes in si format 1 kib, 5 mib, etc you must set the three memory options in the same statement example alter system alter config set 'streamloader extractorengineparameters jvmmemoryconfiguration initialheap' = '64g', 'streamloader extractorengineparameters jvmmemoryconfiguration maxheap' = '64g', 'streamloader extractorengineparameters jvmmemoryconfiguration maxdirect' = '64g'; streamloader extractorengineparameters restport 8090 int use this parameter with the streamloader extractorengineparameters portoffset parameter to form the rest port for the extractor (external monitoring services like {{telegraf}} can query the rest server for loading process metrics ) streamloader extractorengineparameters portoffset 0 int use this parameter with the streamloader extractorengineparameters restport parameter to form the rest port of the extractor (external monitoring services like telegraf can query the rest server for loading process metrics ) streamloader extractorengineparameters configurationoption expect empty file list false boolean set this value to false to fail a file source data pipeline that finds no files to load instead of the load completing successfully with no data loaded streamloader extractorengineparameters configurationoption pipeline preview\ rows limit 1000 int the maximum number of rows a preview pipeline sql statement can return for example, when you use this statement, if the specified limit value exceeds this number, then the loading process throws an error streamloader extractorengineparameters configurationoption engine transform udt jarrootdirectory /opt/ocient/current/lib/extractorengine udt varchar the absolute path to the directory containing jars for data pipeline functions to install and enable a third party library for using data pipeline functions, install the jar package at this location on all loader nodes then, add the chosen fully qualified class name to the import clause of the data pipeline function for details, see docid\ l8tdfpfzzvzeyabc2h7bq streamloader extractorengineparameters configurationoption source record max size dynamically calculated value at start up based on the number of processors and amount of memory available to the jvm bigint the maximum source record size, in bytes, that the loading process tolerates before throwing an error streamloader extractorengineparameters configurationoption s3 region us east 1 varchar the default region for {{aws}} s3 file sources streamloader extractorengineparameters configurationoption s3 force path style access false boolean set this value to true to use path style access for all s3 file sources the default is a virtual hosted style streamloader extractorengineparameters configurationoption s3 force path style access if endpoint overriden true boolean set this value to true to use path style access for s3 file sources that specify endpoints (the system still uses virtual hosted style access for sources without endpoint overrides unless you set streamloader extractorengineparameters configurationoption s3 force path style access to true ) streamloader extractorengineparameters configurationoption s3 retries count 10 int the number of retries for the s3 file source client streamloader extractorengineparameters configurationoption s3 netty read timeout seconds 0 int the timeout value for read operations, in seconds, for the s3 file source client 0 means do not perform a timeout streamloader extractorengineparameters configurationoption s3 netty max concurrency 50 int maximum number of allowed concurrent requests for the s3 file source client streamloader extractorengineparameters configurationoption awssdk backoff strategy base delay seconds 1 int the base amount of time, in seconds, for calculating the time the s3 file source client waits before retrying a failed request the range of values that the calculated time can achieve increases exponentially with each failure you can set the maximum value using the streamloader extractorengineparameters configurationoption awssdk backoff strategy max backoff seconds parameter streamloader extractorengineparameters configurationoption awssdk backoff strategy max backoff seconds 20 int the maximum amount of time the s3 file source client waits before retrying a failed request streamloader extractorengineparameters configurationoption awssdk max pending connection acquires 20000 int the maximum number of pending connection acquires that the s3 file source client allows streamloader extractorengineparameters configurationoption awssdk connection timeout seconds 10 int the amount of time, in seconds, that the s3 file source client waits when initially establishing a connection streamloader extractorengineparameters configurationoption awssdk connection max idle timeout seconds 300 int the maximum amount of time, in seconds, that the system allows a connection established by the s3 file source client to remain open while idle streamloader extractorengineparameters configurationoption awssdk connection acquisition timeout seconds 300 int the amount of time, in seconds, that the s3 file source client waits when acquiring a connection from the pool streamloader extractorengineparameters configurationoption kafka poll timeout msec 1000 bigint the amount of time, in milliseconds, that a consumer for a pipeline with an {{kafka}} source waits for records to become available during a poll operation streamloader extractorengineparameters configurationoption kafka assign timeout msec 20000 bigint the amount of time, in milliseconds, that a preview pipeline sql statement waits for partitions to be assigned to the consumer on a pipeline with a kafka source if the system does not assign any partitions when the timeout elapses, then the preview pipeline statement terminates without data processing streamloader extractorengineparameters configurationoption kafka activate timeout msec 5000 bigint the amount of time, in milliseconds, that a preview pipeline sql statement waits for a partition to receive records on a pipeline with a kafka source streamloader extractorengineparameters configurationoption arrow\ max native memory usage bytes 23756537856 bigint the default maximum native memory usage, in bytes, for the {{parquet}} reader streamloader extractorengineparameters configurationoption arrow\ max java buffer memory usage bytes 15837691904 bigint the default maximum buffer memory usage, in bytes, for the parquet reader streamloader extractorengineparameters configurationoption filesystem access directories /tmp varchar comma separated list of directories where you can load server file system data each directory must exist all files must be located within one of the directories by default, you can load from the /tmp directory related links docid\ xq0tg7yph vn62uwufibu docid\ sczh bwc7qovmhf d8sg2