Extract Options
Option | Usage | Comments | Default |
---|---|---|---|
LOCATION_TYPE | LOCAL, S3 | Dictates where the results are extracted to. LOCAL extracts the results to the local machine. S3 extracts the results to S3. When using S3, other options are required. See S3 options for more information. | None, must be specified |
FILE_PREFIX | LOCAL, S3 | Dictates the prefix used on the results. When extracting to LOCAL, this is the prefix used to determine the path of the results. This can be a relative or full path. When extracting to S3, this is the prefix for the key. In either case, additional file numbers and file extensions are added to generate the complete file name. | results- |
FILE_EXTENSION | LOCAL, S3 | The file extension given to each result file. | .csv |
MAX_ROWS_PER_FILE | LOCAL, S3 | If non-zero, the MAX_ROWS_PER_FILE modifier splits the results into files with maximum MAX_ROWS_PER_FILE in each file. | NULL |
COMPRESSION | LOCAL, S3 | Compression type to use. Currently supports NONE (no compression) and GZIP compression. | NONE |
RECORD_DELIMITER | LOCAL, S3 | Delimiter to use between records. This supports Java strings, so special characters can be input via escape characters. UTF-16: \u[utf-16 value] or Octal \[octal value]. | \n |
FIELD_DELIMITER | LOCAL, S3 | Delimiter to use between fields within a record. This supports Java strings, so special characters can be input using escape characters. UTF-16: \u[utf-16 value] or Octal \[octal value]. | , |
HEADER_MODE | LOCAL, S3 | Dictates how to manage headers in result files. Supported values are NONE, ALL_FILES, and FIRST_FILE. NONE — The tool writes all output files without an additional header. ALL_FILES — The tool adds column names as a header in the first row of each output file. Each file has at most MAX_ROWS_PER_FILE + 1 total rows. FIRST_FILE — The tool adds column names as a header in the first row of the first output file. The tool does not add the header to subsequent files. Each file has at most MAX_ROWS_PER_FILE total rows, inclusive of the header in the first file. | NONE |
NULL_FORMAT | LOCAL, S3 | Format string to use for writing NULL values to the output files. | "" (empty string) |
ENCODING | LOCAL, S3 | Encoding used when writing out data to files. | the default charset of the system. As determined by: Charset.defaultCharset(). Docs |
ESCAPE | LOCAL, S3 | Character used for escaping quoted fields. Set this to the NULL character (\0) to indicate that the escape character is not specified. | \ |
FIELD_OPTIONALLY_ENCLOSED_BY | LOCAL, S3 | Sometimes you need to surround fields in a character. For example, the field might have a literal comma. Generally, this character is also known as the quote character. Set this option to the NULL character (\0) to indicate that the quote character is not specified. | " |
BINARY_FORMAT | LOCAL, S3 | The format with which to encode the BINARY data type. Supports UTF-8, Hexadecimal, and Base64. | Hexadecimal |
COMPRESSION_BLOCK_SIZE | LOCAL, S3 | The number of bytes that comprise each block to be compressed; larger blocks result in better compression at the expense of more RAM usage when compressing. | 4194304 |
COMPRESSION_LEVEL | LOCAL, S3 | An integer value [-1, 9]. Use -1 for GZip’s default compression level, 0 for "no compression", or a value [1-9] where 1 indicates fastest compression and 9 indicates best compression. | 1 |
NUM_COMPRESSION_THREADS | LOCAL, S3 | The number of threads to use for compression. Leave unspecified for the default value. | $(number_of_cores * 2) |
ESCAPE_UNQUOTED_VALUES | LOCAL, S3 | Dictates whether to write escape sequences in unquoted values. Only applicable when FIELD_DELIMITER is set to ,. | false |
INPUT_ESCAPED | LOCAL, S3 | Dictates whether the input is already escaped. When this option is set to true, the tool does not add escape sequences and data is written without changes to the output file. Only applicable when FIELD_DELIMITER is set to ,. Ensure that data is properly escaped, otherwise the extract might produce invalid CSV data. | false |
QUOTE_ALL_FIELDS | LOCAL, S3 | Dictates whether all written fields are enclosed with quotes. When this option is set to true, the tool encloses all fields with the FIELD_OPTIONALLY_ENCLOSED_BY character. | false |
BUCKET | S3 | S3 bucket to use. Ignored if extracting locally. If extracting to S3, this argument is required. | None, required for S3. |
AWS_KEY_ID | S3 | AWS Key ID. If empty, the CLI will use the Java AWS SDK default credentials provider chain documented here. | "" |
AWS_SECRET_KEY | S3 | AWS Secret Key. If empty, the CLI will use the Java AWS SDK default credentials provider chain documented here. | "" |
REGION | S3 | S3 region to upload to. Ignored when extracting to LOCAL. | US_EAST_2 |
ENDPOINT | S3 | Endpoint for S3 upload. Required when extracting to S3. Ignored when extracting to LOCAL. Documentation on endpoint formatting. | None, required for S3 |
PATH_STYLE_ACCESS | S3 | Whether path style access should be used to access a bucket. | false |
TRANSLATE_CHARACTERS_MODE | LOCAL, S3 | Character Mode to use for translating characters. Supported values are CHAR and HEX. The tool performs character translation only if you specify TRANSLATE_CHARACTERS_FROM and TRANSLATE_CHARACTERS_TO. The tool replaces the Nth character in TRANSLATE_CHARACTERS_FROM with the Nth character in TRANSLATE_CHARACTERS_TO in the extracted records. When TRANSLATE_CHARACTERS_MODE is set to CHAR, TRANSLATE_CHARACTERS_FROM and TRANSLATE_CHARACTERS_TO must be equal length strings of UTF-8 characters. For example: TRANSLATE_CHARACTERS_MODE="CHAR", TRANSLATE_CHARACTERS_FROM="àëï", TRANSLATE_CHARACTERS_TO="aei" When TRANSLATE_CHARACTERS_MODE is set to HEX, TRANSLATE_CHARACTERS_FROM and TRANSLATE_CHARACTERS_TO must be comma-separated lists of hexadecimal UTF-8 code points with the same number of list elements. For example: TRANSLATE_CHARACTERS_MODE="HEX", TRANSLATE_CHARACTERS_FROM="c3a0,c3ab,c3af", TRANSLATE_CHARACTERS_TO="61,65,69" | CHAR |
TRANSLATE_CHARACTERS_FROM | LOCAL, S3 | Sequence of UTF-8 characters in the source data to translate to a corresponding character in the TRANSLATE_CHARACTERS_TO option. See the TRANSLATE_CHARACTERS_MODE option for the expected format. | "" |
TRANSLATE_CHARACTERS_TO | LOCAL, S3 | Sequence of UTF-8 characters to use as a replacement for the characters included in TRANSLATE_CHARACTERS_FROM. See the TRANSLATE_CHARACTERS_MODE option for the expected format. | "" |