Data Extract Tool - Ocient Documentation

The data extract tool is a part of the JDBC driver to unload data. You can execute the tool directly from the JDBC CLI. The tool extracts a result set to delimited or files in the target location. To invoke the JDBC CLI, see the JDBC Manual.

To use the data extract tool, you must have JDBC version 2.63 or higher.

Supported Data Extract Formats

The data extract tool supports unloading result sets into files in specific formats. Supported extract formats are:

CSV — Outputs result sets as text files with fields separated by a chosen delimiter.
Parquet — Outputs result sets as Parquet files.

General Command Structure

Here is the general structure of an extract command.

SQL

EXTRACT TO <location_type> [OPTIONS([param=value [,...]])] AS <query>

The command is case-insensitive. Each extract command must start with EXTRACT TO. The location type location_type must be either LOCAL for the local machine or S3 for S3. You can enclose additional options within a pair of parentheses following the keyword OPTIONS. The location type is required and there are required options for each location type. For the LOCAL location type, the options must define the file prefix file_prefix, and for S3, the options must define the file prefix file_prefix, bucket bucket, and endpoint endpoint. Next, the query follows the keyword AS. This example is a simple general command structure.

SQL

EXTRACT TO LOCAL OPTIONS(
  file_prefix="/home/user/out/data_",
  file_extension=".csv"
)
AS SELECT c1 FROM sys.dummy10;

For supported options, see Data Extract Options.

Specify Options, Quoting, and Escaping Quotes

Here is the general format of options.

SQL

key1 = value1, key2 = value2, ... , keyN = valueN

You need to follow certain guidelines when you specify options. Keys (option names) can only consist of alphanumeric characters and are unquoted. Values can be either quoted (with the reserved character ") or unquoted. If values are unquoted, they can only contain alphanumeric characters. If the value has a non-alphanumeric character, you must quote it with the reserved character ". Note that the single quote character does not work.

SQL

OPTIONS(file_prefix = "/path/to/dir/result", header_mode = none, file_extension = ".csv")

To use the reserved quote character " as an argument, you must escape it with the backslash character \. To use \ as an argument, you must escape it with another \. This code illustrates both of these scenarios.

SQL

OPTIONS(field_optionally_enclosed_by = "\"", escape = "\\")

Data Extract Options

Use these optional options with the EXTRACT TO syntax and the OPTIONS keyword to configure the behavior of the extract.

General Extract Options

This table describes optional options that apply to both the LOCAL and S3 location types.

Option	Description	Default
`FILE_TYPE`	The type of the output file for extraction. Supports extraction to a delimited `.csv` file or Parquet `.parquet` file.	`DELIMITED`
`FILE_PREFIX`	Dictates the prefix used on the results. When extracting to `LOCAL`, this is the prefix used to determine the path of the results. This value can be a relative or full path. When extracting to S3, this value is the prefix for the key. In either case, the system adds additional file numbers and file extensions to generate the complete filename.	`results-`
`FILE_PREFIX_EXISTS`	Determines the behavior if the path specified by the `FILE_PREFIX` option already exists. Supported values are: `FAIL` and `OVERWRITE`. The `FAIL` value throws an error, whereas `OVERWRITE` deletes the contents of the path.	`'FAIL'`
`FILE_EXTENSION`	The file extension specified for each output file.	`.csv`
`MAX_ROWS_PER_FILE`	If you set this option to a non-zero value, the system splits the results into files with the specified maximum number of rows per file.	`NULL`
`COMPRESSION`	Compression type to use for a delimited extract. Supported compression types are: `NONE` — No compression `GZIP` — GZip compression `BZIP2` — bzip2 compression `XZ` — xz compression	`NONE`
`RECORD_DELIMITER`	Delimiter to use between records. This supports strings, so special characters can be input using escape characters. UTF-16: `\u[utf-16 value]` or Octal `\[octal value]`.	`\n`
`FIELD_DELIMITER`	Delimiter to use between fields within a record. This supports Java strings, so special characters can be input using escape characters. UTF-16: `\u[utf-16 value]` or Octal `\[octal value]`.	`,`
`HEADER_MODE`	Dictates how to manage headers in result files. Supported values are `NONE`, `ALL_FILES`, and `FIRST_FILE`. `NONE` — The tool writes all output files without an additional header. `ALL_FILES` — The tool adds column names as a header in the first row of each output file. Each file has at most `MAX_ROWS_PER_FILE` + 1 total rows. `FIRST_FILE` — The tool adds column names as a header in the first row of the first output file. The tool does not add the header to subsequent files. Each file has at most `MAX_ROWS_PER_FILE` total rows, inclusive of the header in the first file.	`NONE`
`NULL_FORMAT`	Format string to use for writing NULL values to the output files.	`""` (empty string)
`ENCODING`	Encoding used when writing out data to files.	The default character set of the system, as determined by the Oracle documentation.
`ESCAPE`	Character used for escaping quoted fields. Set this to the NULL character `\0` to indicate that the escape character is not specified.	`"`
`FIELD_OPTIONALLY_ENCLOSED_BY`	Sometimes, you need to surround fields in a character. For example, the field might have a literal comma. Generally, this character is also known as the quote character. Set this option to the NULL character `\0` to indicate that the quote character is not specified.	`"`
`BINARY_FORMAT`	The format with which to encode the BINARY data type. Supports `UTF-8`, `Hexadecimal`, and `Base64`.	`Hexadecimal`
`COMPRESSION_BLOCK_SIZE`	The number of bytes that comprise each block to be compressed; larger blocks result in better compression at the expense of more RAM usage when compressing.	`4194304`
`COMPRESSION_LEVEL`	An integer value [-1, 9]. Use `-1` for the GZip default compression level, `0` for no compression, or a value [1-9] where 1 indicates fastest compression and 9 indicates best compression.	`1`
`NUM_COMPRESSION_THREADS`	The number of threads to use for compression.	$(number of cores * 2)
`NUM_FETCH_QUERIES`	The number of parallel queries to execute in the database for data extraction.	`15`
`ESCAPE_UNQUOTED_VALUES`	Dictates whether to write escape sequences in unquoted values. Only applicable when `FIELD_DELIMITER` is set to `,`.	`false`
`INPUT_ESCAPED`	Dictates whether the input is already escaped. When this option is set to true, the tool does not add escape sequences, and data is written without changes to the output file. Only applicable when `FIELD_DELIMITER` is set to `,`. Ensure that data is properly escaped, otherwise the extract might produce invalid CSV data.	`false`
`PARTITION_MODE`	The strategy for partitioning the data. Supported values are: `NONE`, `KEY`, and `RANGE`. When you set this option to `NONE`, the tool uses standard extraction. When you set this option to `KEY`, the tool creates subdirectories for each unique value specified by the `PARTITION_COLUMNS` option. When you set this option to `RANGE`, the tool splits the data into the number of queries specified by the `NUM_FETCH_QUERIES` option based on the range of values specified in the `PARTITION_COLUMNS` option.	`NONE`
`PARTITION_COLUMNS`	The comma-separated list of columns to use for partitioning data when you set the `PARTITION_MODE` option to `KEY` or `RANGE`. The `RANGE` value only allows a single column. See File Naming Conventions for the file path structure for multiple partitioning columns when using the `KEY` value.	`NULL`
`QUOTE_ALL_FIELDS`	Dictates whether all written fields are enclosed with quotes. When this option is set to true, the tool encloses all fields with the `FIELD_OPTIONALLY_ENCLOSED_BY` character.	`false`
`SUCCESS_MARKER`	Identifies a successful completion of the extract. If you set this option to `true`, the tool creates a file with the `_SUCCESS` suffix in the root output directory when the extract of the entire job completes successfully.	`true`
`TARGET_FILE_SIZE_MB`	Specifies the size in megabytes for the target output file. The data extract tool splits the output into files of approximately this size. The tool ignores this option if you set the `MAX_ROWS_PER_FILE` option.	`NULL`
`TRANSLATE_CHARACTERS_MODE`	Character Mode to use for translating characters. Supported values are `CHAR` and `HEX`. The tool performs character translation only if you specify `TRANSLATE_CHARACTERS_FROM` and `TRANSLATE_CHARACTERS_TO`. The tool replaces the Nth character in `TRANSLATE_CHARACTERS_FROM` with the Nth character in `TRANSLATE_CHARACTERS_TO` in the extracted records. When `TRANSLATE_CHARACTERS_MODE` is set to CHAR, `TRANSLATE_CHARACTERS_FROM`, and `TRANSLATE_CHARACTERS_TO` must be equal-length strings of UTF-8 characters. For example: `TRANSLATE_CHARACTERS_MODE="CHAR"`, `TRANSLATE_CHARACTERS_FROM="àëï"`, `TRANSLATE_CHARACTERS_TO="aei"` When `TRANSLATE_CHARACTERS_MODE` is set to `HEX`, `TRANSLATE_CHARACTERS_FROM`, and `TRANSLATE_CHARACTERS_TO` must be comma-separated lists of hexadecimal UTF-8 code points with the same number of list elements. For example: `TRANSLATE_CHARACTERS_MODE="HEX"`, `TRANSLATE_CHARACTERS_FROM="c3a0,c3ab,c3af"`, `TRANSLATE_CHARACTERS_TO="61,65,69"`	`CHAR`
`TRANSLATE_CHARACTERS_FROM`	Sequence of UTF-8 characters in the source data to translate to a corresponding character in the `TRANSLATE_CHARACTERS_TO` option. See the `TRANSLATE_CHARACTERS_MODE` option for the expected format.	`""`
`TRANSLATE_CHARACTERS_TO`	Sequence of UTF-8 characters to use as a replacement for the characters included in `TRANSLATE_CHARACTERS_FROM`. See the `TRANSLATE_CHARACTERS_MODE` option for the expected format.	`""`
`TRIM_TRAILING_ZEROS`	Dictates whether to trim trailing zeros from numeric input fields.	`false`
`PARQUET_COMPRESSION`	Compression type to use for a Parquet extract. Supported compression types are: `NONE` — No compression `ZSTD` — ZSTD compression `SNAPPY` — Snappy compression `GZIP` — GZip compression	`SNAPPY`
`PARQUET_ROW_GROUP_SIZE_BYTES`	The size in bytes for row groups within a Parquet output file.	`536870912` (512 MB)

If you do not set either the MAX_ROWS_PER_FILE or the TARGET_FILE_SIZE_MB options, the data extract tool generates one output file for each partition. For a query without partitions, the tool generates a single output file.

S3 Extract Options

This table describes the required options that apply only to the S3 location type. The data extract tool ignores these options when you use the LOCAL location type.

Option	Description
`BUCKET`	S3 bucket to use.
`ENDPOINT`	Endpoint for S3 upload. For details, see the documentation for specifying endpoints.

This table describes optional options that apply only to the S3 location type.

Option	Description	Default
`AWS_KEY_ID`	AWS Key ID. If empty, the CLI uses the Java AWS SDK default credentials provider chain documented here.	`""`
`AWS_SECRET_KEY`	AWS Secret Key. If empty, the CLI uses the Java AWS SDK default credentials provider chain documented here.	`""`
`REGION`	S3 region to upload to. Ignored when extracting to LOCAL.	`US_EAST_2`
`PATH_STYLE_ACCESS`	Whether path style access should be used to access a bucket.	`true` for JDBC version 3.4.1 and `false` for JDBC version 3.4.0 and earlier

File Naming Conventions

When you use the data extract tool, the tool produces a number of files. The System determines the path of these files using multiple factors:

File prefix as specified by the FILE_PREFIX option
- If the PARTITION_MODE option is not set to NONE, the file prefix must be a directory and end with a forward slash. The system creates subdirectories in the style for each partition (for example, for the KEY partition: /tmp/extract/col1=val1/col2=val2 or for the RANGE partition: /tmp/extract/val1<=col1<val2/). If the PARTITION_MODE option is set to NONE, the file prefix can be a directory or a proper prefix (for example, /tmp/extract/results_).
Partition mode as specified by the PARTITION_MODE option
Maximum number of rows, or target size in MB, for each file as specified by the MAX_ROWS_PER_FILE option
- If you do not set a maximum number of rows per file and no target file size, this part of the convention is 0. However, if the MAX_ROWS_PER_FILE or TARGET_FILE_SIZE_MB options are set, then the system outputs rows to a single file until that limit or the lower of the two limits is reached. Then, the system generates another file with an incremented file number. The file number starts from 0.
Format or file extension
- This convention applies only if the FILE_TYPE option is set to DELIMITED. For the PARQUET type, the file extension is always .parquet, even when you specify any type of compression using the PARQUET_COMPRESSION option.
Compression extension
- If you specify the COMPRESSION option for an extract with the DELIMITED type, then the tool adds the compression suffix.

Examples

These examples show you how to use the data extract tool for extracting data to the local machine, extracting CSV data, and extracting Parquet data. Most of these examples assume the database already contains tables with data. For loading data, see Load Data. Local Extract to the Default Path This example extracts the results of the SELECT c1 FROM sys.dummy10 SQL query to the local machine at the relative path result-0.csv. The SQL query returns 10 rows with incremental numbers starting at 1. The OPTIONS(...) syntax is only required when you specify additional options.

SQL

EXTRACT TO LOCAL
AS SELECT c1 FROM sys.dummy10;

sys.dummy creates a virtual table with the specified number for rows. For details, see Generate Tables Using sys.dummy.

Local CSV Extract to the Specified Path This example extracts the results of the SELECT c1 FROM sys.dummy10 SQL query to the local machine at the absolute path /home/user/out/data_0.csv. The SQL query returns 10 rows with incremental numbers starting at 1.

SQL

EXTRACT TO LOCAL OPTIONS(
  file_prefix="/home/user/out/data_",
  file_extension=".csv"
)
AS SELECT c1 FROM sys.dummy10;

Local Parquet Extract with Partition by Key This example partitions customer data into a Parquet file by the country column. This partitioning creates a Hive-style directory structure on the local machine. Use the file path /home/user/customer_data/ with the KEY value for the partition strategy. Specify the target file size to be 256 MB. The SQL query selects the identifier, name, email, and country from the customers table.

SQL

EXTRACT TO LOCAL OPTIONS(
    FILE_PREFIX = '/home/user/customer_data/',
    FILE_TYPE = PARQUET,
    PARTITION_MODE = 'KEY',
    PARTITION_COLUMNS = 'country',
    TARGET_FILE_SIZE_MB = 256
) AS SELECT id, name, email, country FROM customers;

The resulting file structure has these file paths.

/home/user/customer_data/country=US/part-0000.parquet
/home/user/customer_data/country=CA/part-0000.parquet
...

S3 Parquet Extract with Compression This example extracts all sales data from the today_sales table to a single Parquet file on S3 using ZSTD compression. Use the S3 bucket named my-analytics-bucket with the file path daily_reports/report.parquet using the endpoint s3.us-east-1.amazonaws.com.

SQL

EXTRACT TO S3 OPTIONS(
    BUCKET = 'my-analytics-bucket',
    FILE_PREFIX = 'daily_reports/report.parquet',
    ENDPOINT = 's3.us-east-1.amazonaws.com',
    FILE_TYPE = PARQUET,
    PARQUET_COMPRESSION = 'ZSTD'
) AS SELECT * FROM today_sales;

S3 Parquet Extract with Partitioning by Range This example extracts large event data to S3 by splitting the event_id column into 30 parallel queries using the RANGE partition strategy. Specify the maximum file size in 1024 megabytes. Override the default row group size using the PARQUET_ROW_GROUP_SIZE_BYTES option to specify 268435456 bytes (256 MB). Use the S3 bucket iot-data with the file path events/2025-09-25/ at endpoint s3.us-east-1.amazonaws.com. The SQL query selects the event identifier, sensor identifier, sensor reading, and timestamp from the events table.

SQL

EXTRACT TO S3 OPTIONS(
    BUCKET = 'iot-data',
    FILE_PREFIX = 'events/2025-09-25/',
    ENDPOINT = 's3.us-east-1.amazonaws.com',
    FILE_TYPE = PARQUET,
    PARTITION_MODE = 'RANGE',
    PARTITION_COLUMNS = 'event_id',
    NUM_FETCH_QUERIES = 30,
    TARGET_FILE_SIZE_MB = 1024,
    PARQUET_ROW_GROUP_SIZE_BYTES = 268435456
) AS SELECT event_id, sensor_id, sensor_reading, ts FROM events;

Connect Using JDBC JDBC Manual

​Supported Data Extract Formats

​General Command Structure

​Specify Options, Quoting, and Escaping Quotes

​Data Extract Options

​General Extract Options

​S3 Extract Options

​File Naming Conventions

​Examples

​Related Links

Supported Data Extract Formats

General Command Structure

Specify Options, Quoting, and Escaping Quotes

Data Extract Options

General Extract Options

S3 Extract Options

File Naming Conventions

Examples

Related Links