> ## Documentation Index
> Fetch the complete documentation index at: https://docs.ocient.com/llms.txt
> Use this file to discover all available pages before exploring further.

# Data Extract Tool

export const Java = "Java®";

export const AWS = "Amazon® Web Services℠ (AWS℠)";

The data extract tool is a part of the JDBC driver to unload data. You can execute the tool directly from the JDBC CLI. The tool extracts a result set to delimited files in the target location. To invoke the JDBC CLI, see the [JDBC Manual](/jdbc-manual).

<Info>
  To use the data extract tool, you must have JDBC version 2.63 or higher.
</Info>

## Supported Data Extract Formats

The data extract tool supports unloading result sets into files in specific formats. Supported extract formats are:

* CSV — Outputs result sets as text files with fields separated by a chosen delimiter.

## General Command Structure

Here is the general structure of an extract command.

```sql SQL theme={null}
EXTRACT TO <location_type> [OPTIONS([param=value [,...]])] AS <query>
```

The command is case-insensitive. Each extract command must start with `EXTRACT TO`. The location type needs to follow and must be either LOCAL or {AWS} S3. You can enclose additional options within a pair of parentheses following the word `OPTIONS`. Note that location type is the only required option. Next, the query follows the word `AS`.

This example is a simple general command structure.

```sql SQL theme={null}
EXTRACT TO LOCAL OPTIONS(
  file_prefix="/home/user/out/data_",
  file_extension=".csv"
)
AS SELECT c1 FROM sys.dummy10;
```

For supported options, see [Extract Options](#extract-options).

## Specify Options, Quoting, and Escaping Quotes

Here is the general format of options.

```sql SQL theme={null}
key1 = value1, key2 = value2, ... , keyN = valueN
```

You need to follow certain guidelines when you specify options. Keys (option names) can only consist of alphanumeric characters and are unquoted. Values can be either quoted (with the reserved character `"`) or unquoted. If values are unquoted, they can only contain alphanumeric characters. If the value has a non-alphanumeric character, you must quote it with the reserved character `"`. Note that the single quote character does not work.

```sql SQL theme={null}
OPTIONS(file_prefix = "/path/to/dir/result", header_mode = none, file_extension = ".csv")
```

To use the reserved quote character `"` as an argument, you must escape it with the backslash character `\`. To use `\` as an argument, you must escape it with another `\`. This code illustrates both of these scenarios.

```sql SQL theme={null}
OPTIONS(field_optionally_enclosed_by = "\"", escape = "\\")
```

## File Naming Conventions

When you use the extract tool, the tool produces a number of files. If the tool extracts with a single thread, the tool names files in this convention: `{file_prefix}_{file_number}{file_extension}{gzip_extension}`

1. `file_prefix`: Option specified by the user.
2. `file_number`: This part of the convention is `0` if all results go into one file. However, if `MAX_ROWS_PER_FILE` is set, then the tool places rows into one file until that limit is reached. Then, the tool generates another file with an incremented `file_number`. File number starts from 0.
3. `file_extension`: Option specified by the user.
4. `gzip_extension`: If you specify gzip compression, then the tool adds the `.gzip` suffix.

If you use the multithreaded extract, then the naming convention is: `{file_prefix}{thread_number}_{file_number}{file_extension}`

`file_prefix`, `file_extension`, `gzip_extension` is still determined with the set options. `thread_number` is the number of the thread going from 0 to N - 1 where N is the number of threads specified. The file number is now calculated per thread with the number of rows given to that thread. Recall that rows are distributed to threads in a round-robin order, starting with thread 0. So, thread 0 receives the rows 0, N, 2N, 3N, etc. Thread 1 receives the rows 1, N + 1, 2N + 1, 3N + 1, etc.

## Extract Options

This table describes optional options that apply to both the `LOCAL` and `S3` location types.

| **Option**                     | **Description**                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             | **Default**                                                                                                                                                    |
| ------------------------------ | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `FILE_TYPE`                    | The type of the output file for extraction. Supports extraction to a delimited `.csv` file.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 | `DELIMITED`                                                                                                                                                    |
| `FILE_PREFIX`                  | Dictates the prefix used on the results. When extracting to `LOCAL`, this is the prefix used to determine the path of the results. This value can be a relative or full path. When extracting to S3, this value is the prefix for the key. In either case, the system adds additional file numbers and file extensions to generate the complete filename.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   | `results-`                                                                                                                                                     |
| `FILE_PREFIX_EXISTS`           | Determines the behavior if the path specified by the `FILE_PREFIX` option already exists. Supported values are: `FAIL` and `OVERWRITE`. The `FAIL` value throws an error, whereas `OVERWRITE` deletes the contents of the path.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             | `'FAIL'`                                                                                                                                                       |
| `FILE_EXTENSION`               | The file extension specified for each output file.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          | `.csv`                                                                                                                                                         |
| `MAX_ROWS_PER_FILE`            | If you set this option to a non-zero value, the system splits the results into files with the specified maximum number of rows per file.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    | `NULL`                                                                                                                                                         |
| `COMPRESSION`                  | Compression type to use for a delimited extract. Supported compression types are: <br />`NONE ` —  No compression<br />`GZIP` — GZip compression<br />`BZIP2 ` — bzip2 compression<br />`XZ` — xz compression                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               | `NONE`                                                                                                                                                         |
| `RECORD_DELIMITER`             | Delimiter to use between records. This supports {Java} strings, so special characters can be input using escape characters. UTF-16: `\u[utf-16 value]` or Octal `\[octal value]`.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           | `\n`                                                                                                                                                           |
| `FIELD_DELIMITER`              | Delimiter to use between fields within a record. This supports Java strings, so special characters can be input using escape characters. UTF-16: `\u[utf-16 value]` or Octal `\[octal value]`.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              | `,`                                                                                                                                                            |
| `HEADER_MODE`                  | Dictates how to manage headers in result files. Supported values are `NONE`, `ALL_FILES`, and `FIRST_FILE`.<br /><br />`NONE` — The tool writes all output files without an additional header.<br />`ALL_FILES` — The tool adds column names as a header in the first row of each output file. Each file has at most `MAX_ROWS_PER_FILE` + 1 total rows.<br />`FIRST_FILE` — The tool adds column names as a header in the first row of the first output file. The tool does not add the header to subsequent files. Each file has at most `MAX_ROWS_PER_FILE` total rows, inclusive of the header in the first file.                                                                                                                                                                                                                                                                                                                                                                                                                                       | `NONE`                                                                                                                                                         |
| `NULL_FORMAT`                  | Format string to use for writing NULL values to the output files.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           | `""` (empty string)                                                                                                                                            |
| `ENCODING`                     | Encoding used when writing out data to files.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               | The default character set of the system, as determined by the [Oracle documentation](https://docs.oracle.com/javase/7/docs/api/java/nio/charset/Charset.html). |
| `ESCAPE`                       | Character used for escaping quoted fields. Set this to the NULL character `\0` to indicate that the escape character is not specified.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      | `"`                                                                                                                                                            |
| `FIELD_OPTIONALLY_ENCLOSED_BY` | Sometimes, you need to surround fields in a character. For example, the field might have a literal comma. Generally, this character is also known as the quote character. Set this option to the NULL character `\0` to indicate that the quote character is not specified.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 | `"`                                                                                                                                                            |
| `BINARY_FORMAT`                | The format with which to encode the BINARY data type. Supports `UTF-8`, `Hexadecimal`, and `Base64`.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        | `Hexadecimal`                                                                                                                                                  |
| `COMPRESSION_BLOCK_SIZE`       | The number of bytes that comprise each block to be compressed; larger blocks result in better compression at the expense of more RAM usage when compressing.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                | `4194304`                                                                                                                                                      |
| `COMPRESSION_LEVEL`            | An integer value \[-1, 9]. Use `-1` for the GZip default compression level, `0` for no compression, or a value \[1-9] where 1 indicates fastest compression and 9 indicates best compression.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               | `1`                                                                                                                                                            |
| `NUM_COMPRESSION_THREADS`      | The number of threads to use for compression.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               | \$(number of cores \* 2)                                                                                                                                       |
| `NUM_FETCH_QUERIES`            | The number of parallel queries to execute in the database for data extraction.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              | `15`                                                                                                                                                           |
| `ESCAPE_UNQUOTED_VALUES`       | Dictates whether to write escape sequences in unquoted values. Only applicable when `FIELD_DELIMITER` is set to `,`.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        | `false`                                                                                                                                                        |
| `INPUT_ESCAPED`                | Dictates whether the input is already escaped. When this option is set to true, the tool does not add escape sequences, and data is written without changes to the output file. Only applicable when `FIELD_DELIMITER` is set to `,`.<br />Ensure that data is properly escaped, otherwise the extract might produce invalid CSV data.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      | `false`                                                                                                                                                        |
| `PARTITION_MODE`               | The strategy for partitioning the data. Supported values are: `NONE`, `KEY`, and `RANGE`.<br /><br />When you set this option to<br />`NONE`, the tool uses standard extraction. When you set this option to<br />`KEY`, the tool creates subdirectories for each unique value specified by the <br />`PARTITION_COLUMNS`<br />option. When you set this option to<br />`RANGE`, the tool splits the data into the number of queries specified by the <br />`NUM_FETCH_QUERIES`<br /> option based on the range of values specified in the <br />`PARTITION_COLUMNS`<br />option.                                                                                                                                                                                                                                                                                                                                                                                                                                                                           | `NONE`                                                                                                                                                         |
| `PARTITION_COLUMNS`            | The comma-separated list of columns to use for partitioning data when you set the `PARTITION_MODE` option to `KEY` or `RANGE`. The `RANGE` value only allows a single column. See  [File Naming Conventions](#file-naming-conventions) for the file path structure for multiple partitioning columns when using the `KEY` value.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            | `NULL`                                                                                                                                                         |
| `QUOTE_ALL_FIELDS`             | Dictates whether all written fields are enclosed with quotes. When this option is set to true, the tool encloses all fields with the `FIELD_OPTIONALLY_ENCLOSED_BY` character.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              | `false`                                                                                                                                                        |
| `SUCCESS_MARKER`               | Identifies a successful completion of the extract. If you set this option to `true`, the tool creates a file with the `_SUCCESS` suffix in the root output directory when the extract of the entire job completes successfully.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             | `true`                                                                                                                                                         |
| `TARGET_FILE_SIZE_MB`          | Specifies the size in megabytes for the target output file. The data extract tool splits the output into files of approximately this size. The tool ignores this option if you set the `MAX_ROWS_PER_FILE` option.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          | `NULL`                                                                                                                                                         |
| `TRANSLATE_CHARACTERS_MODE`    | Character Mode to use for translating characters. Supported values are `CHAR `and `HEX`.<br />The tool performs character translation only if you specify `TRANSLATE_CHARACTERS_FROM` and `TRANSLATE_CHARACTERS_TO`. The tool replaces the Nth character in `TRANSLATE_CHARACTERS_FROM` with the Nth character in `TRANSLATE_CHARACTERS_TO` in the extracted records.<br />When `TRANSLATE_CHARACTERS_MODE` is set to CHAR, `TRANSLATE_CHARACTERS_FROM`, and `TRANSLATE_CHARACTERS_TO` must be equal-length strings of UTF-8 characters. For example:<br />`TRANSLATE_CHARACTERS_MODE="CHAR"`,<br />`TRANSLATE_CHARACTERS_FROM="àëï"`,<br />`TRANSLATE_CHARACTERS_TO="aei"`<br /><br />When `TRANSLATE_CHARACTERS_MODE` is set to `HEX`, `TRANSLATE_CHARACTERS_FROM`, and `TRANSLATE_CHARACTERS_TO` must be comma-separated lists of hexadecimal UTF-8 code points with the same number of list elements. For example:<br />`TRANSLATE_CHARACTERS_MODE="HEX"`,<br />`TRANSLATE_CHARACTERS_FROM="c3a0,c3ab,c3af"`,<br />`TRANSLATE_CHARACTERS_TO="61,65,69"` | `CHAR`                                                                                                                                                         |
| `TRANSLATE_CHARACTERS_FROM`    | Sequence of UTF-8 characters in the source data to translate to a corresponding character in the `TRANSLATE_CHARACTERS_TO `option. See the `TRANSLATE_CHARACTERS_MODE` option for the expected format.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      | `""`                                                                                                                                                           |
| `TRANSLATE_CHARACTERS_TO`      | Sequence of UTF-8 characters to use as a replacement for the characters included in `TRANSLATE_CHARACTERS_FROM`. See the `TRANSLATE_CHARACTERS_MODE` option for the expected format.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        | `""`                                                                                                                                                           |
| `TRIM_TRAILING_ZEROS`          | Dictates whether to trim trailing zeros from numeric input fields.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          | `false`                                                                                                                                                        |

## Examples

**Basic Extract Example**

This example unloads the results of `SELECT c1 FROM sys.dummy10` to the local machine at the relative path. The extract statement does not specify any file name, so the file uses the default name, `result-_0.csv`.

```sql SQL theme={null}
EXTRACT TO LOCAL
AS SELECT c1 FROM sys.dummy10;
```

<Info>
  `sys.dummy` creates a virtual table with the specified number for rows. For details, see [Generate Tables Using sys.dummy](/generate-tables-using-sys-dummy).
</Info>

**Extract With Additional Options Example**

This example extracts the results of `SELECT c1 FROM sys.dummy10` to the S3 bucket `my-data-bucket` with the endpoint [`https://s3.us-east-2.amazonaws.com`](https://s3.us-east-2.amazonaws.com).

The extract statement uses additional options to write 100 rows to the S3 bucket.

The `MAX_ROWS_PER_FILE` option value limits each file to only 10 rows, so the extract tool splits the unloaded data between 10 separate files. Each created file follows the naming pattern `query_results/data_{file_number}.csv.gz`. The `file_number` values range from `0` to `9`, inclusive.

```sql SQL theme={null}
EXTRACT TO S3 OPTIONS(
    BUCKET = "my-data-bucket",
    ENDPOINT = "https://s3.us-east-2.amazonaws.com",
    FILE_PREFIX = "query_results/data",
    FILE_EXTENSION = ".csv",
    COMPRESSION = GZIP,
    MAX_ROWS_PER_FILE = 10
)
AS SELECT c1 FROM sys.dummy100;
```

## Related Links

[Connect Using JDBC](/connect-using-jdbc)

[JDBC Manual](/jdbc-manual)
