Load Data

Error Tolerance in Data Pipelines

Error Limits

The error limit for SQL statements such as START PIPELINE my_pipeline ERROR LIMIT 100 defines the number of record-level errors that a batch of files tolerates before failing the entire pipeline.

The error process is:

  • Periodically, the System assigns a batch of pending files to a Loader Node.
  • The system enforces the maximum error limit for each batch.
  • Files that have at least one record-level error reach the terminal status LOADED_WITH_ERRORS after all processing is complete.
  • If the file batch reaches the error limit, the current file reaches the status FAILED, and then the pipeline status becomes FAILED. Processing stops on the pipeline.

By default, BATCH pipelines run with ERROR LIMIT 0. This permits zero errors on a pipeline. This setting stops the pipeline on the first error and stores the error in sys.pipeline_errors. When you restart the pipeline, it retries the failed file and failed record. The system does not duplicate any previously loaded data. To complete the load, fix the issue with the data or the pipeline definition or increase the error limit when you restart the pipeline.

Unrecoverable File Errors

You can manage unrecoverable file errors such as Gzip decompression errors, tokenization errors, or missing files by starting the pipeline with the FILE_ERROR option.

The strictest setting is FILE_ERROR FAIL. When you use the FAIL setting, any file-level error causes the pipeline to fail. If any file is missing or cannot be processed, the pipeline marks the file FAILED, sets the pipeline status to FAILED, and processing stops on the pipeline.

The most tolerant setting is FILE_ERROR TOLERATE. Use this setting in a statement such as START PIPELINE my_pipeline ERROR FILE_ERROR TOLERATE. When you use the TOLERATE setting, missing files reach the terminal status SKIPPED. Files that encounter other file-level errors reach the terminal status LOADED_WITH_ERRORS. When the Ocient System encounters a file-level error, processing stops on the file and continues with the next file in the partition. When you use the TOLERATE setting, the pipeline automatically executes with an unlimited error limit.

Recoverability

If a pipeline fails when it tolerates no errors, manually fix the row or file where the error occurs and restart the pipeline so that the load proceeds with all records correctly deduplicated. Restarting the pipeline without fixing the error but with tolerance of record or file-level errors also allows the load to proceed with all records correctly deduplicated. However, restarts do not guarantee correct deduplication if you apply manual fixes after this point.

When a pipeline fails, the number of loaded rows is nondeterministic. As a result, the Ocient System does not guarantee the reflection of any manual modification of files with the statuses SKIPPED, LOADED_WITH_ERRORS, or FAILED in a restart operation. However, if the pipeline fails due to a low error limit, you can raise the error limit on a restart operation to allow the pipeline to make further progress.

You can use the BAD_DATA_TARGET option to capture failing records for troubleshooting and reloading.

Restarts and Deduplication

Ocient pipelines enable the restart of a pipeline without creating duplicate data in the target tables. However, there are some limitations for each type of data source. For details, see Data Pipelines

Related Links