Data Extract Tool
the data extract tool is a part of the jdbc driver to unload data you can execute the tool directly from the jdbc cli the tool extracts a result set to delimited files in the target location to invoke the jdbc cli, see the jdbc manual docid\ sxzv4o pczrmsfp w ke1 to use the data extract tool, you must have jdbc version 2 63 or higher supported data extract formats the data extract tool supports unloading result sets into files in specific formats supported extract formats are csv — outputs result sets as text files with fields separated by a chosen delimiter general command structure here is the general structure of an extract command extract to \<location type> \[options(\[param=value \[, ]])] as \<query> the command is case insensitive each extract command must start with extract to the location type needs to follow and must be either local or s3 you can enclose additional options within a pair of parentheses following the word options note that location type is the only required option next, the query follows the word as this example is a simple general command structure extract to local options( file prefix="/home/user/out/data ", file extension=" csv" ) as select c1 from sys dummy10; for supported options, see extract options docid\ l76cafn5 0kguzfzwa8uj specify options, quoting, and escaping quotes here is the general format of options key1 = value1, key2 = value2, , keyn = valuen you need to follow certain guidelines when you specify options keys (option names) can only consist of alphanumeric characters and are unquoted values can be either quoted (with the reserved character " ) or unquoted if values are unquoted, they can only contain alphanumeric characters if the value has a non alphanumeric character, you must quote it with the reserved character " note that the single quote character does not work options(file prefix = "/path/to/dir/result", header mode = none, file extension = " csv") to use the reserved quote character " as an argument, you must escape it with the backslash character \ to use \ as an argument, you must escape it with another \ this code illustrates both of these scenarios options(field optionally enclosed by = "\\"", escape = "\\\\") file naming conventions when you use the extract tool, the tool produces a number of files if the tool extracts with a single thread, the tool names files in this convention {file prefix} {file number}{file extension}{gzip extension} file prefix option specified by the user file number this part of the convention is 0 if all results go into one file however, if max rows per file is set, then the tool places rows into one file until that limit is reached then, the tool generates another file with an incremented file number file number starts from 0 file extension option specified by the user gzip extension if you specify gzip compression, then the tool adds the gzip suffix if you use the multithreaded extract, then the naming convention is {file prefix}{thread number} {file number}{file extension} file prefix , file extension , gzip extension is still determined with the set options thread number is the number of the thread going from 0 to n 1 where n is the number of threads specified the file number is now calculated per thread with the number of rows given to that thread recall that rows are distributed to threads in a round robin order, starting with thread 0 so, thread 0 receives the rows 0, n, 2n, 3n, etc thread 1 receives the rows 1, n + 1, 2n + 1, 3n + 1, etc extract options this table describes optional options that apply to both the local and s3 location types option description default file type the type of the output file for extraction supports extraction to a delimited csv file delimited file prefix dictates the prefix used on the results when extracting to local , this is the prefix used to determine the path of the results this value can be a relative or full path when extracting to s3, this value is the prefix for the key in either case, the system adds additional file numbers and file extensions to generate the complete filename results file prefix exists determines the behavior if the path specified by the file prefix option already exists supported values are fail and overwrite the fail value throws an error, whereas overwrite deletes the contents of the path 'fail' file extension the file extension specified for each output file csv max rows per file if you set this option to a non zero value, the system splits the results into files with the specified maximum number of rows per file null compression compression type to use for a delimited extract supported compression types are none — n o compression gzip — gzip compression bzip2 — bzip2 compression xz — xz compression none record delimiter delimiter to use between records this supports strings, so special characters can be input using escape characters utf 16 \u\[utf 16 value] or octal \\\[octal value] \n field delimiter delimiter to use between fields within a record this supports java strings, so special characters can be input using escape characters utf 16 \u\[utf 16 value] or octal \\\[octal value] , header mode dictates how to manage headers in result files supported values are none , all files , and first file none — the tool writes all output files without an additional header all files — the tool adds column names as a header in the first row of each output file each file has at most max rows per file + 1 total rows first file — the tool adds column names as a header in the first row of the first output file the tool does not add the header to subsequent files each file has at most max rows per file total rows, inclusive of the header in the first file none null format format string to use for writing null values to the output files "" (empty string) encoding encoding used when writing out data to files t he default character set of the system, as determined by the oracle documentation https //docs oracle com/javase/7/docs/api/java/nio/charset/charset html escape character used for escaping quoted fields set this to the null character \0 to indicate that the escape character is not specified " field optionally enclosed by sometimes, you need to surround fields in a character for example, the field might have a literal comma generally, this character is also known as the quote character set this option to the null character \0 to indicate that the quote character is not specified " binary format the format with which to encode the binary data type supports utf 8 , hexadecimal , and base64 hexadecimal compression block size the number of bytes that comprise each block to be compressed; larger blocks result in better compression at the expense of more ram usage when compressing 4194304 compression level an integer value \[ 1, 9] use 1 for the gzip default compression level, 0 for no compression, or a value \[1 9] where 1 indicates fastest compression and 9 indicates best compression 1 num compression threads the number of threads to use for compression $(number of cores 2) num fetch queries the number of parallel queries to execute in the database for data extraction 15 escape unquoted values dictates whether to write escape sequences in unquoted values only applicable when field delimiter is set to , false input escaped dictates whether the input is already escaped when this option is set to true, the tool does not add escape sequences, and data is written without changes to the output file only applicable when field delimiter is set to , ensure that data is properly escaped, otherwise the extract might produce invalid csv data false partition mode the strategy for partitioning the data supported values are none , key , and range when you set this option to none , the tool uses standard extraction when you set this option to key , the tool creates subdirectories for each unique value specified by the partition columns option when you set this option to range , the tool splits the data into the number of queries specified by the num fetch queries option based on the range of values specified in the partition columns option none partition columns the comma separated list of columns to use for partitioning data when you set the partition mode option to key or range the range value only allows a single column see file naming conventions docid 1qed3argif73tv8xblrsl for the file path structure for multiple partitioning columns when using the key value null quote all fields dictates whether all written fields are enclosed with quotes when this option is set to true, the tool encloses all fields with the field optionally enclosed by character false success marker identifies a successful completion of the extract if you set this option to true , the tool creates a file with the success suffix in the root output directory when the extract of the entire job completes successfully true target file size mb specifies the size in megabytes for the target output file the data extract tool splits the output into files of approximately this size the tool ignores this option if you set the max rows per file option null translate characters mode character mode to use for translating characters supported values are char and hex the tool performs character translation only if you specify translate characters from and translate characters to the tool replaces the nth character in translate characters from with the nth character in translate characters to in the extracted records when translate characters mode is set to char, translate characters from , and translate characters to must be equal length strings of utf 8 characters for example translate characters mode="char" , translate characters from="àëï" , translate characters to="aei" when translate characters mode is set to hex , translate characters from , and translate characters to must be comma separated lists of hexadecimal utf 8 code points with the same number of list elements for example translate characters mode="hex" , translate characters from="c3a0,c3ab,c3af" , translate characters to="61,65,69" char translate characters from sequence of utf 8 characters in the source data to translate to a corresponding character in the translate characters to option see the translate characters mode option for the expected format "" translate characters to sequence of utf 8 characters to use as a replacement for the characters included in translate characters from see the translate characters mode option for the expected format "" trim trailing zeros dictates whether to trim trailing zeros from numeric input fields false examples basic extract example this example unloads the results of select c1 from sys dummy10 to the local machine at the relative path the extract statement does not specify any file name, so the file uses the default name, result 0 csv extract to local as select c1 from sys dummy10; sys dummy creates a virtual table with the specified number for rows for details, see generate tables using sys dummy docid\ h27revcjzrpzmmvhosb14 extract with additional options example this example extracts the results of select c1 from sys dummy10 to the s3 bucket my data bucket with the endpoint https //s3 us east 2 amazonaws com https //s3 us east 2 amazonaws com the extract statement uses additional options to write 100 rows to the s3 bucket t he max rows per file option value limits each file to only 10 rows, so the extract tool splits the unloaded data between 10 separate files each created file follows the naming pattern query results/data {file number} csv gz the file number values range from 0 to 9 , inclusive extract to s3 options( bucket = "my data bucket", endpoint = "https //s3 us east 2 amazonaws com", file prefix = "query results/data", file extension = " csv", compression = gzip, max rows per file = 10 ) as select c1 from sys dummy100; related links connect using jdbc docid sfpgk k74g4uyommitwt jdbc manual docid\ sxzv4o pczrmsfp w ke1