Data Extract Manual

The data extract tool is a part of the JDBC driver to unload data. You can execute the tool directly from the JDBC CLI. The tool extracts a result set to delimited files in the target location. To invoke the JDBC CLI, see the JDBC Manual﻿.
To use the data extract tool, you must have JDBC version 2.63 or higher.
General Command Structure
The general structure of an extract command is:
SQL
EXTRACT TO <location_type>{ LOCAL | S3 } [OPTIONS([param=value [,...]])] AS <query>
EXTRACT TO <location_type>{ LOCAL | S3 } [OPTIONS([param=value [,...]])] AS <query>
﻿
The command is case insensitive. Each extract command must start with EXTRACT TO. The location type needs to follow and must be either LOCAL or  S3. You can enclose additional options within a pair of parenthesis following the word OPTIONS. Note that location type is the only required option. Next, the query follows the word AS.
A simple example of the general command structure is:
SQL
EXTRACT TO LOCAL OPTIONS(
  file_prefix="/home/user/out/data_",
  file_extension=".csv"
)
AS SELECT c1 FROM sys.dummy10;
EXTRACT TO LOCAL OPTIONS(
  file_prefix="/home/user/out/data_",
  file_extension=".csv"
)
AS SELECT c1 FROM sys.dummy10;
﻿
For supported options, see Extract Options﻿.
Specify Options, Quoting, and Escaping Quotes
The general format of options is:
SQL
key1 = value1, key2 = value2, ... , keyN = valueN
key1 = value1, key2 = value2, ... , keyN = valueN
﻿
You need to follow certain guidelines when you specify options. Keys (option names) can only consist of alphanumeric characters and are unquoted. Values can be either quoted (with the reserved character ") or unquoted. If values are unquoted, they can only contain alphanumeric characters. If the value has a non-alphanumeric character, you must quote it with the reserved character ". Note that the single quote character does not work. 
SQL
OPTIONS(file_prefix = "/path/to/dir/result", header_mode = none, file_extension = ".csv")
OPTIONS(file_prefix = "/path/to/dir/result", header_mode = none, file_extension = ".csv")
﻿
To use the reserved quote character " as an argument, you must escape it with the backslash character \. To use \ as an argument, you must escape it with another \. This code illustrates both of these scenarios.
SQL
OPTIONS(field_optionally_enclosed_by = "\"", escape = "\\")
OPTIONS(field_optionally_enclosed_by = "\"", escape = "\\")
﻿
File Naming Conventions
When you use the extract tool, the tool produces a number of files. If the tool extracts with a single thread, the tool names files in this convention: {file_prefix}_{file_number}{file_extension}{gzip_extension}
file_prefix: Option specified by the user.
file_number: This part of the convention is 0 if all results go into one file. However, if MAX_ROWS_PER_FILE is set, then rows are placed into one file until that limit is hit. Then another file generates with an incremented file_number. File number starts from 0.
file_extension: Option specified by the user.
gzip_extension: If you specify gzip compression, then the tool adds the .gzip suffix.
If you use the multithreaded extract, then the naming convention is: {file_prefix}{thread_number}_{file_number}{file_extension}
file_prefix, file_extension, gzip_extension is still determined with the set options. thread_number is the number of the thread going from 0 to N - 1 where N is the number of threads specified. The file number is now calculated on a per thread basis with the number of rows given to that thread. Recall that rows are distributed to threads in a round robin order, starting with thread 0. So, thread 0 receives the rows 0, N, 2N, 3N, etc. Thread 1 receives the rows 1, N + 1, 2N + 1, 3N + 1, etc.
Extract Options 
Option
Usage
Comments
Default
LOCATION_TYPE
LOCAL, S3
Dictates where the results are extracted to. LOCAL extracts the results to the local machine.  S3 extracts the results to S3. When using S3, other options are required. See S3 options for more information.
None, must be specified
FILE_PREFIX
LOCAL, S3
Dictates the prefix used on the results. When extracting to LOCAL, this is the prefix used to determine the path of the results. This can be a relative or full path. When extracting to S3, this is the prefix for the key. In either case, additional file numbers and file extensions are added to generate the complete file name.
results-
FILE_EXTENSION
LOCAL, S3
The file extension given to each result file.
.csv
MAX_ROWS_PER_FILE
LOCAL, S3
If non-zero, the MAX_ROWS_PER_FILE modifier splits the results into files with maximum MAX_ROWS_PER_FILE in each file.
NULL
COMPRESSION
LOCAL, S3
Compression type to use. Currently supports NONE (no compression) and GZIP compression.
NONE
RECORD_DELIMITER
LOCAL, S3
Delimiter to use between records. This supports Java strings, so special characters can be input using escape characters. UTF-16: \u[utf-16 value] or Octal \[octal value].
\n
FIELD_DELIMITER
LOCAL, S3
Delimiter to use between fields within a record. This supports Java strings, so special characters can be input using escape characters. UTF-16: \u[utf-16 value] or Octal \[octal value].
,
HEADER_MODE
LOCAL, S3
Dictates how to manage headers in result files. Supported values are NONE, ALL_FILES, and FIRST_FILE.
NONE — The tool writes all output files without an additional header.
ALL_FILES — The tool adds column names as a header in the first row of each output file. Each file has at most MAX_ROWS_PER_FILE + 1 total rows.
FIRST_FILE — The tool adds column names as a header in the first row of the first output file. The tool does not add the header to subsequent files. Each file has at most MAX_ROWS_PER_FILE total rows, inclusive of the header in the first file.
NONE
NULL_FORMAT
LOCAL, S3
Format string to use for writing NULL values to the output files.
"" (empty string)
ENCODING
LOCAL, S3
Encoding used when writing out data to files.
The default charset of the system, as determined by the Oracle documentation.
ESCAPE
LOCAL, S3
Character used for escaping quoted fields. Set this to the NULL character (\0) to indicate that the escape character is not specified.
\
FIELD_OPTIONALLY_ENCLOSED_BY
LOCAL, S3
Sometimes you need to surround fields in a character. For example, the field might have a literal comma. Generally, this character is also known as the quote character. Set this option to the NULL character (\0) to indicate that the quote character is not specified.
"
BINARY_FORMAT
LOCAL, S3
The format with which to encode the BINARY data type. Supports UTF-8, Hexadecimal, and Base64.
Hexadecimal
COMPRESSION_BLOCK_SIZE
LOCAL, S3
The number of bytes that comprise each block to be compressed; larger blocks result in better compression at the expense of more RAM usage when compressing.
4194304
COMPRESSION_LEVEL
LOCAL, S3
An integer value [-1, 9]. Use -1 for GZip’s default compression level, 0 for "no compression", or a value [1-9] where 1 indicates fastest compression and 9 indicates best compression.
1
NUM_COMPRESSION_THREADS
LOCAL, S3
The number of threads to use for compression. Leave unspecified for the default value.
$(number_of_cores * 2)
ESCAPE_UNQUOTED_VALUES
LOCAL, S3
Dictates whether to write escape sequences in unquoted values. Only applicable when FIELD_DELIMITER is set to ,.
false
INPUT_ESCAPED
LOCAL, S3
Dictates whether the input is already escaped. When this option is set to true, the tool does not add escape sequences and data is written without changes to the output file. Only applicable when FIELD_DELIMITER is set to ,.

Ensure that data is properly escaped, otherwise the extract might produce invalid CSV data.
false
QUOTE_ALL_FIELDS
LOCAL, S3
Dictates whether all written fields are enclosed with quotes. When this option is set to true, the tool encloses all fields with the FIELD_OPTIONALLY_ENCLOSED_BY character.
false
BUCKET
S3
S3 bucket to use. Ignored if extracting locally. If extracting to S3, this argument is required.
None, required for S3.
AWS_KEY_ID
S3
AWS Key ID. If empty, the CLI will use the Java AWS SDK default credentials provider chain documented here.
""
AWS_SECRET_KEY
S3
AWS Secret Key. If empty, the CLI will use the Java AWS SDK default credentials provider chain documented here.
""
REGION
S3
S3 region to upload to. Ignored when extracting to LOCAL.
US_EAST_2
ENDPOINT
S3
Endpoint for S3 upload. Required when extracting to S3. Ignored when extracting to LOCAL. Documentation on endpoint formatting.
None, required for S3
PATH_STYLE_ACCESS
S3
Whether path style access should be used to access a bucket.
false
TRANSLATE_CHARACTERS_MODE
LOCAL, S3
Character Mode to use for translating characters. Supported values are CHAR and HEX.
The tool performs character translation only if you specify TRANSLATE_CHARACTERS_FROM and TRANSLATE_CHARACTERS_TO. The tool replaces the Nth character in TRANSLATE_CHARACTERS_FROM with the Nth character in TRANSLATE_CHARACTERS_TO in the extracted records.
When TRANSLATE_CHARACTERS_MODE is set to CHAR, TRANSLATE_CHARACTERS_FROM and TRANSLATE_CHARACTERS_TO must be equal length strings of UTF-8 characters. For example:
TRANSLATE_CHARACTERS_MODE="CHAR",
TRANSLATE_CHARACTERS_FROM="àëï",
TRANSLATE_CHARACTERS_TO="aei"
When TRANSLATE_CHARACTERS_MODE is set to HEX, TRANSLATE_CHARACTERS_FROM and TRANSLATE_CHARACTERS_TO must be comma-separated lists of hexadecimal UTF-8 code points with the same number of list elements. For example:
TRANSLATE_CHARACTERS_MODE="HEX",
TRANSLATE_CHARACTERS_FROM="c3a0,c3ab,c3af",
TRANSLATE_CHARACTERS_TO="61,65,69"
CHAR
TRANSLATE_CHARACTERS_FROM
LOCAL, S3
Sequence of UTF-8 characters in the source data to translate to a corresponding character in the TRANSLATE_CHARACTERS_TO option. See the TRANSLATE_CHARACTERS_MODE option for the expected format.
""
TRANSLATE_CHARACTERS_TO
LOCAL, S3
Sequence of UTF-8 characters to use as a replacement for the characters included in TRANSLATE_CHARACTERS_FROM. See the TRANSLATE_CHARACTERS_MODE option for the expected format.
""
Examples
OPTIONS(...) is only required when you specify additional options. This example unloads the results of SELECT c1 FROM sys.dummy10 to the local machine at the relative path result-0.csv. 
SQL
EXTRACT TO LOCAL
AS SELECT c1 FROM sys.dummy10;
EXTRACT TO LOCAL
AS SELECT c1 FROM sys.dummy10;
﻿
sys.dummy creates a virtual table with the specified number for rows. For more information, see Generate Tables Using sys.dummy﻿.
This example extracts the results of SELECT c1 FROM sys.dummy10 to the local machine at the absolute path /home/user/out/data_0.csv.
SQL
EXTRACT TO local OPTIONS(
  file_prefix="/home/user/out/data_",
  file_extension=".csv"
)
AS SELECT c1 FROM sys.dummy10;
EXTRACT TO local OPTIONS(
  file_prefix="/home/user/out/data_",
  file_extension=".csv"
)
AS SELECT c1 FROM sys.dummy10;
﻿
The result set has 100 rows and MAX_ROWS_PER_FILE is set to 10. Thus, the tool writes 10 files to S3 with the keys query_results/query_0/_results_{file_number}.csv.gzip where file_number ranges from 0 to 9, inclusive.
Related Links
﻿Connect Using JDBC﻿﻿
﻿JDBC Manual﻿﻿
﻿

Option	Usage	Comments	Default
LOCATION_TYPE	LOCAL, S3	Dictates where the results are extracted to. LOCAL extracts the results to the local machine. S3 extracts the results to S3. When using S3, other options are required. See S3 options for more information.	None, must be specified
FILE_PREFIX	LOCAL, S3	Dictates the prefix used on the results. When extracting to LOCAL, this is the prefix used to determine the path of the results. This can be a relative or full path. When extracting to S3, this is the prefix for the key. In either case, additional file numbers and file extensions are added to generate the complete file name.	results-
FILE_EXTENSION	LOCAL, S3	The file extension given to each result file.	.csv
MAX_ROWS_PER_FILE	LOCAL, S3	If non-zero, the MAX_ROWS_PER_FILE modifier splits the results into files with maximum MAX_ROWS_PER_FILE in each file.	NULL
COMPRESSION	LOCAL, S3	Compression type to use. Currently supports NONE (no compression) and GZIP compression.	NONE
RECORD_DELIMITER	LOCAL, S3	Delimiter to use between records. This supports Java strings, so special characters can be input using escape characters. UTF-16: \u[utf-16 value] or Octal \[octal value].	\n
FIELD_DELIMITER	LOCAL, S3	Delimiter to use between fields within a record. This supports Java strings, so special characters can be input using escape characters. UTF-16: \u[utf-16 value] or Octal \[octal value].	,
HEADER_MODE	LOCAL, S3	Dictates how to manage headers in result files. Supported values are NONE, ALL_FILES, and FIRST_FILE. NONE — The tool writes all output files without an additional header. ALL_FILES — The tool adds column names as a header in the first row of each output file. Each file has at most MAX_ROWS_PER_FILE + 1 total rows. FIRST_FILE — The tool adds column names as a header in the first row of the first output file. The tool does not add the header to subsequent files. Each file has at most MAX_ROWS_PER_FILE total rows, inclusive of the header in the first file.	NONE
NULL_FORMAT	LOCAL, S3	Format string to use for writing NULL values to the output files.	"" (empty string)
ENCODING	LOCAL, S3	Encoding used when writing out data to files.	The default charset of the system, as determined by the Oracle documentation.
ESCAPE	LOCAL, S3	Character used for escaping quoted fields. Set this to the NULL character (\0) to indicate that the escape character is not specified.	\
FIELD_OPTIONALLY_ENCLOSED_BY	LOCAL, S3	Sometimes you need to surround fields in a character. For example, the field might have a literal comma. Generally, this character is also known as the quote character. Set this option to the NULL character (\0) to indicate that the quote character is not specified.	"
BINARY_FORMAT	LOCAL, S3	The format with which to encode the BINARY data type. Supports UTF-8, Hexadecimal, and Base64.	Hexadecimal
COMPRESSION_BLOCK_SIZE	LOCAL, S3	The number of bytes that comprise each block to be compressed; larger blocks result in better compression at the expense of more RAM usage when compressing.	4194304
COMPRESSION_LEVEL	LOCAL, S3	An integer value [-1, 9]. Use -1 for GZip’s default compression level, 0 for "no compression", or a value [1-9] where 1 indicates fastest compression and 9 indicates best compression.	1
NUM_COMPRESSION_THREADS	LOCAL, S3	The number of threads to use for compression. Leave unspecified for the default value.	$(number_of_cores * 2)
ESCAPE_UNQUOTED_VALUES	LOCAL, S3	Dictates whether to write escape sequences in unquoted values. Only applicable when FIELD_DELIMITER is set to ,.	false
INPUT_ESCAPED	LOCAL, S3	Dictates whether the input is already escaped. When this option is set to true, the tool does not add escape sequences and data is written without changes to the output file. Only applicable when FIELD_DELIMITER is set to ,. Ensure that data is properly escaped, otherwise the extract might produce invalid CSV data.	false
QUOTE_ALL_FIELDS	LOCAL, S3	Dictates whether all written fields are enclosed with quotes. When this option is set to true, the tool encloses all fields with the FIELD_OPTIONALLY_ENCLOSED_BY character.	false
BUCKET	S3	S3 bucket to use. Ignored if extracting locally. If extracting to S3, this argument is required.	None, required for S3.
AWS_KEY_ID	S3	AWS Key ID. If empty, the CLI will use the Java AWS SDK default credentials provider chain documented here.	""
AWS_SECRET_KEY	S3	AWS Secret Key. If empty, the CLI will use the Java AWS SDK default credentials provider chain documented here.	""
REGION	S3	S3 region to upload to. Ignored when extracting to LOCAL.	US_EAST_2
ENDPOINT	S3	Endpoint for S3 upload. Required when extracting to S3. Ignored when extracting to LOCAL. Documentation on endpoint formatting.	None, required for S3
PATH_STYLE_ACCESS	S3	Whether path style access should be used to access a bucket.	false
TRANSLATE_CHARACTERS_MODE	LOCAL, S3	Character Mode to use for translating characters. Supported values are CHAR and HEX. The tool performs character translation only if you specify TRANSLATE_CHARACTERS_FROM and TRANSLATE_CHARACTERS_TO. The tool replaces the Nth character in TRANSLATE_CHARACTERS_FROM with the Nth character in TRANSLATE_CHARACTERS_TO in the extracted records. When TRANSLATE_CHARACTERS_MODE is set to CHAR, TRANSLATE_CHARACTERS_FROM and TRANSLATE_CHARACTERS_TO must be equal length strings of UTF-8 characters. For example: TRANSLATE_CHARACTERS_MODE="CHAR", TRANSLATE_CHARACTERS_FROM="àëï", TRANSLATE_CHARACTERS_TO="aei" When TRANSLATE_CHARACTERS_MODE is set to HEX, TRANSLATE_CHARACTERS_FROM and TRANSLATE_CHARACTERS_TO must be comma-separated lists of hexadecimal UTF-8 code points with the same number of list elements. For example: TRANSLATE_CHARACTERS_MODE="HEX", TRANSLATE_CHARACTERS_FROM="c3a0,c3ab,c3af", TRANSLATE_CHARACTERS_TO="61,65,69"	CHAR
TRANSLATE_CHARACTERS_FROM	LOCAL, S3	Sequence of UTF-8 characters in the source data to translate to a corresponding character in the TRANSLATE_CHARACTERS_TO option. See the TRANSLATE_CHARACTERS_MODE option for the expected format.	""
TRANSLATE_CHARACTERS_TO	LOCAL, S3	Sequence of UTF-8 characters to use as a replacement for the characters included in TRANSLATE_CHARACTERS_FROM. See the TRANSLATE_CHARACTERS_MODE option for the expected format.	""

Updated 19 Jun 2024

Did this page help you?

JDBC Manual

Commands Supported by the Ocient JDBC CLI Program