Load Data

Error Tolerance in Data Pipelines

error limits the error limit for sql statements such as start pipeline my pipeline error limit 100 defines the number of record level errors that a batch of files tolerates before failing the entire pipeline the error process is periodically, the {{ocient}} system assigns a batch of pending files to a loader node the system enforces the maximum error limit for each batch files that have at least one record level error reach the terminal status loaded with errors after all processing is complete if the file batch reaches the error limit, the current file reaches the status failed , and then the pipeline status becomes failed processing stops on the pipeline by default, batch pipelines run with error limit 0 this permits zero errors on a pipeline this setting stops the pipeline on the first error and stores the error in sys pipeline errors when you restart the pipeline, it retries the failed file and failed record the system does not duplicate any previously loaded data to complete the load, fix the issue with the data or the pipeline definition or increase the error limit when you restart the pipeline unrecoverable file errors you can manage unrecoverable file errors such as gzip decompression errors, tokenization errors, or missing files by starting the pipeline with the file error option the strictest setting is file error fail when you use the fail setting, any file level error causes the pipeline to fail if any file is missing or cannot be processed, the pipeline marks the file failed , sets the pipeline status to failed , and processing stops on the pipeline the most tolerant setting is file error tolerate use this setting in a statement such as start pipeline my pipeline error file error tolerate when you use the tolerate setting, missing files reach the terminal status skipped files that encounter other file level errors reach the terminal status loaded with errors when the ocient system encounters a file level error, processing stops on the file and continues with the next file in the partition when you use the tolerate setting, the pipeline automatically executes with an unlimited error limit recoverability if a pipeline fails when it tolerates no errors, manually fix the row or file where the error occurs and restart the pipeline so that the load proceeds with all records correctly deduplicated restarting the pipeline without fixing the error but with tolerance of record or file level errors also allows the load to proceed with all records correctly deduplicated however, restarts do not guarantee correct deduplication if you apply manual fixes after this point when a pipeline fails, the number of loaded rows is nondeterministic as a result, the ocient system does not guarantee the reflection of any manual modification of files with the statuses skipped , loaded with errors , or failed in a restart operation however, if the pipeline fails due to a low error limit, you can raise the error limit on a restart operation to allow the pipeline to make further progress you can use the bad data target option to capture failing records for troubleshooting and reloading restarts and deduplication ocient pipelines enable the restart of a pipeline without creating duplicate data in the target tables however, there are some limitations for each type of data source for details, see docid\ l8tdfpfzzvzeyabc2h7bq related links docid\ l8tdfpfzzvzeyabc2h7bq docid\ l8tdfpfzzvzeyabc2h7bq docid 2ua4kqafcqplu gi6oorz