Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.ocient.com/llms.txt

Use this file to discover all available pages before exploring further.

When you load data into , you can manage the way that the system responds to errors. This guide outlines different mechanisms for managing errors through examples. The guide uses the same setup as Data Pipeline Load of JSON Data from Kafka but it modifies that pipeline to illustrate how to manage errors that you might encounter in data pipeline functionality. You learn about:
  • Bad data and error handling
    • Inspecting record-level processing errors
    • Configuring a bad data target
  • Controlling pipeline behavior when you encounter errors
    • File-level error behavior
    • Record-level error behavior
  • Understanding pipeline statuses in file loading

Step 1: Complete the Load JSON Data from Kafka Tutorial

See Data Pipeline Load of JSON Data from Kafka and complete the tutorial, then return to this guide for Step 2.

Step 2: Modify the Pipeline with a Bad Data Target

Bad data targets are the way to capture record-level errors for later troubleshooting and correction. Data pipelines support an bad data target that allows you to send any failing records to a designated Kafka topic. This topic supports easy replay of those records in a separate recovery pipeline or inspection of the failing records. For details about bad data targets, see the Bad Data Targets Reference.

Stop the Data Pipeline

Stop the orders_pipeline data pipeline using the STOP PIPELINE SQL statement.
SQL
STOP PIPELINE orders_pipeline;
Inspect the pipeline status using this SELECT SQL statement and the pipeline_status information schema view.
SQL
SELECT
  status
FROM information_schema.pipeline_status
WHERE
  pipeline_name = 'orders_pipeline';
While the pipeline is stopping, the status might appear as STOPPING while the pipeline wraps up work in process and commits offsets back to Kafka. After the status changes to STOPPED, you can proceed to the next step. A pipeline might not be in a running state when you modify it.

Modify the Data Pipeline

In this example, use the CREATE PIPELINE OR REPLACE SQL statement to update the data pipeline. This statement maintains the same offsets with the Kafka topic and ensures that data is deduplicated correctly when the pipeline resumes. Add the BAD_DATA_TARGET option and specify the Kafka target and details. Use the same Kafka broker as the source and bad data target in this example.
SQL
CREATE PIPELINE OR REPLACE orders_pipeline
BAD_DATA_TARGET
  KAFKA
    BOOTSTRAP_SERVERS '192.0.2.1:9092'
    TOPIC 'orders_errors'
SOURCE
  KAFKA
    BOOTSTRAP_SERVERS '192.0.2.1:9092'
    TOPIC 'orders'
EXTRACT
  FORMAT json
INTO public.orders
SELECT
  $id as id,
  $user_id as user_id,
  $product_id as product_id,
  $subtotal as subtotal,
  $tax as tax,
  $total as total,
  $discount as discount,
  $created_at as created_at,
  $quantity as quantity;

Step 3: Start the Data Pipeline with an Error Limit

Start the orders_pipeline data pipeline. In this case, set a maximum error limit so that the pipeline fails if 10 errors occur. An ERROR LIMIT might not be chosen for a continuously streaming pipeline if it is critical to keep up with newly arriving data, because the pipeline can stop suddenly and cause data freshness to fall behind. However, this example shows how error limits behave and how error states appear in pipelines.
SQL
START PIPELINE orders_pipeline
ERROR LIMIT 10;
The pipeline status changes to STARTED to indicate that it has begun processing.
For file-based loads, the START PIPELINE SQL statement includes options for FILE_ERRORS that are useful for controlling error handling in different file failure modes. Options include failing on error, skipping on file error, and tolerating file-level errors. See the Start Pipeline Command Reference for details.

Step 4: Produce Data with Errors into Kafka

With the data pipeline in place and running, data immediately begins to load from the Kafka topics that are configured in the pipeline from the last committed offsets after the pipeline stopped. For test purposes, kafkacat is a helpful utility that makes it easy to produce records in a topic. For example, if you have a file of sample data orders_with_errors.jsonl in a JSONL format (newline-delimited JSON records), you can run this command to send those records into your Kafka broker. <broker_ip_address> is the IP address of the Kafka broker and <topic_name> is the name of the Kafka topic.
Shell
kafkacat -b <broker_ip_address>:9092 -t <topic_name> -T -P -l orders_with_errors.jsonl
Save an example document to your file system to use for this test. For this example, you can download an example file from https://ocient-docs.s3.amazonaws.com/metabase_samples/jsonl/orders_with_errors.jsonl and save it to ~/orders_with_errors.jsonl. Assume the broker is running at 192.0.2.1, to send data into the orders topic defined in the pipeline definition, run this command.
Shell
kafkacat -b 192.0.2.1:9092 -t orders -T -P -l orders_with_errors.jsonl
This command pushes the entire JSONL file of messages into Kafka with one record per line. As these records are produced into Kafka, the running pipeline begins to load them into Ocient.

Step 5: Observe Loading Progress

With your pipeline running, data immediately begins to load from the Kafka topic. The pipeline creates parallel Kafka Consumers for each partition. If there are more partitions than processing cores available, the pipeline automatically handles spreading consumers across available processing cores and Loader Nodes. You can check the pipeline status and progress by querying the information_schema.pipeline_status view or running the command SHOW PIPELINE_STATUS. Use the view to query the name of the pipeline, status, number of failed records, and the status message.
SELECT
  pipeline_name, status, records_failed, status_message
FROM
  information_schema.pipeline_status;
Output
pipeline_name       status     records_failed    status_message
----------------------------------------------------------------------------------------------
orders_pipeline     FAILED     11                Started processing pipeline orders_pipeline

Inspect Errors

Examine the pipeline errors directly using the sys.pipeline_errors system catalog table.
SELECT count(*) FROM sys.pipeline_errors;
Output
count(*)
--------------------
11
Using this table, you can see detailed error messages, the source of the error, the column that encountered the error, and other helpful details to help troubleshoot why the record did not load.
SQL
SELECT * from sys.pipeline_errors;
Output
SQL
pipeline_id                          extractor_task_id                    error_index         error_type                                   error_code                                   source_name                                  error_message                                                                                                                                                                                 partition_id    record_number  record_offset   field_index column_name     created_at
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
f3ca5a1c-1bf4-40a4-8d01-b8c028fdfb58 1ecc5c4a-613f-48a5-8234-2257317a4a85 1                   TRANSFORMATION                               OLT01                                        orders-0                                     Failed to transform value 'not_int' using AsciiIntTransform. Details: Invalid ASCII digit: 'n'. Expected ASCII digit in [0, 9].                                                               0               22324401       0               1           user_id         2024-04-18 23:17:07.022000000
f3ca5a1c-1bf4-40a4-8d01-b8c028fdfb58 1ecc5c4a-613f-48a5-8234-2257317a4a85 2                   TRANSFORMATION                               OLT01                                        orders-0                                     Failed to transform value '2019-08-22T:30:42.392Z' using ByteBufferAsciiTimestampWithVariableLengthFractionTransform. Details: Invalid ASCII digit: ':'. Expected ASCII digit in [0, 9].      0               22324403       0               7           created_at      2024-04-18 23:17:07.391000000
f3ca5a1c-1bf4-40a4-8d01-b8c028fdfb58 1ecc5c4a-613f-48a5-8234-2257317a4a85 3                   TRANSFORMATION                               OLT01                                        orders-0                                     Failed to transform value 'not_decimal' using StringDoubleTransform. Details: Error parsing string 'not_decimal' as a double. Cause: For input string: "not_decimal"                          0               22324404       0               4           tax             2024-04-18 23:17:07.397000000
f3ca5a1c-1bf4-40a4-8d01-b8c028fdfb58 1ecc5c4a-613f-48a5-8234-2257317a4a85 4                   TRANSFORMATION                               OLT01                                        orders-0                                     Failed to transform value '55.11' using AsciiIntTransform. Details: Invalid ASCII digit: '.'. Expected ASCII digit in [0, 9].                                                                 0               22324406       0               2           product_id      2024-04-18 23:17:07.399000000
f3ca5a1c-1bf4-40a4-8d01-b8c028fdfb58 1ecc5c4a-613f-48a5-8234-2257317a4a85 5                   TRANSFORMATION                               OLT01                                        orders-0                                     Failed to transform value 'not_int' using AsciiIntTransform. Details: Invalid ASCII digit: 'n'. Expected ASCII digit in [0, 9].                                                               0               22324408       0               1           user_id         2024-04-18 23:17:07.400000000
f3ca5a1c-1bf4-40a4-8d01-b8c028fdfb58 1ecc5c4a-613f-48a5-8234-2257317a4a85 6                   TRANSFORMATION                               OLS00                                        orders-0                                     Failed to load transformed value 'null' to sink after ZCNullIfEmptyTransform. Details: Provided null for column with index = '1' and name = 'user_id', but this column disallows nulls.       0               22324410       0               1           user_id         2024-04-18 23:17:07.402000000
f3ca5a1c-1bf4-40a4-8d01-b8c028fdfb58 1ecc5c4a-613f-48a5-8234-2257317a4a85 7                   TRANSFORMATION                               OLS00                                        orders-0                                     Failed to load transformed value 'null' to sink after ZCNullIfEmptyTransform. Details: Provided null for column with index = '1' and name = 'user_id', but this column disallows nulls.       0               22324411       0               1           user_id         2024-04-18 23:17:07.404000000
f3ca5a1c-1bf4-40a4-8d01-b8c028fdfb58 1ecc5c4a-613f-48a5-8234-2257317a4a85 8                   TRANSFORMATION                               OLS00                                        orders-0                                     Failed to load transformed value 'null' to sink after ZCNullIfEmptyTransform. Details: Provided null for column with index = '2' and name = 'product_id', but this column disallows nulls.    0               22324412       0               2           product_id      2024-04-18 23:17:07.405000000
f3ca5a1c-1bf4-40a4-8d01-b8c028fdfb58 1ecc5c4a-613f-48a5-8234-2257317a4a85 9                   TRANSFORMATION                               OLS00                                        orders-0                                     Failed to load transformed value 'null' to sink after ZCNullIfEmptyTransform. Details: Provided null for column with index = '1' and name = 'user_id', but this column disallows nulls.       0               22324413       0               1           user_id         2024-04-18 23:17:07.409000000
f3ca5a1c-1bf4-40a4-8d01-b8c028fdfb58 1ecc5c4a-613f-48a5-8234-2257317a4a85 10                  TRANSFORMATION                               OLT03                                        orders-0                                     Failed to transform value '2018-06-26' using ByteBufferAsciiTimestampWithVariableLengthFractionTransform because the value is too short. Expected a timestamp composed of the ...             0               22324414       0               7           created_at      2024-04-18 23:17:07.410000000
f3ca5a1c-1bf4-40a4-8d01-b8c028fdfb58 1ecc5c4a-613f-48a5-8234-2257317a4a85 11                  TRANSFORMATION                               OLS00                                        orders-0                                     Failed to load transformed value 'null' to sink after ZCNullIfEmptyTransform. Details: Provided null for column with index = '2' and name = 'product_id', but this column disallows nulls.    0               22324415       0               2           product_id      2024-04-18 23:17:07.411000000
For a more abbreviated view, query a few columns in the table. This example queries the error type, error code, error message, and name of the column.
SQL
SELECT
  error_type, error_code, error_message, column_name
FROM sys.pipeline_errors;
Output
SQL
error_type          error_code      error_message                                                                                                                                                                                  column_name
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
TRANSFORMATION      OLT01           Failed to transform value 'not_int' using AsciiIntTransform. Details: Invalid ASCII digit: 'n'. Expected ASCII digit in [0, 9].                                                                user_id
TRANSFORMATION      OLT01           Failed to transform value '2019-08-22T:30:42.392Z' using ByteBufferAsciiTimestampWithVariableLengthFractionTransform. Details: Invalid ASCII digit: ':'. Expected ASCII digit in [0, 9].       created_at
TRANSFORMATION      OLT01           Failed to transform value 'not_decimal' using StringDoubleTransform. Details: Error parsing string 'not_decimal' as a double. Cause: For input string: "not_decimal"                           tax
TRANSFORMATION      OLT01           Failed to transform value '55.11' using AsciiIntTransform. Details: Invalid ASCII digit: '.'. Expected ASCII digit in [0, 9].                                                                  product_id
TRANSFORMATION      OLT01           Failed to transform value 'not_int' using AsciiIntTransform. Details: Invalid ASCII digit: 'n'. Expected ASCII digit in [0, 9].                                                                user_id
TRANSFORMATION      OLS00           Failed to load transformed value 'null' to sink after ZCNullIfEmptyTransform. Details: Provided null for column with index = '1' and name = 'user_id', but this column disallows nulls.        user_id
TRANSFORMATION      OLS00           Failed to load transformed value 'null' to sink after ZCNullIfEmptyTransform. Details: Provided null for column with index = '1' and name = 'user_id', but this column disallows nulls.        user_id
TRANSFORMATION      OLS00           Failed to load transformed value 'null' to sink after ZCNullIfEmptyTransform. Details: Provided null for column with index = '2' and name = 'product_id', but this column disallows nulls.     product_id
TRANSFORMATION      OLS00           Failed to load transformed value 'null' to sink after ZCNullIfEmptyTransform. Details: Provided null for column with index = '1' and name = 'user_id', but this column disallows nulls.        user_id
TRANSFORMATION      OLT03           Failed to transform value '2018-06-26' using ByteBufferAsciiTimestampWithVariableLengthFractionTransform because the value is too short. Expected a timestamp composed of the ...              created_at
TRANSFORMATION      OLS00           Failed to load transformed value 'null' to sink after ZCNullIfEmptyTransform. Details: Provided null for column with index = '2' and name = 'product_id', but this column disallows nulls.     product_id
From this result set, you can quickly inspect error messages, such as the first error message.
Text
Failed to transform value 'not_int' using AsciiIntTransform. Details: Invalid ASCII digit: 'n'. Expected ASCII digit in [0, 9].
This message occurs for the user_id column. From this information, you can see that you have some bad data in the data set. In other errors, you can see the error code OLS00 and a message that indicates some issues with NULL handling. The table has a NOT NULL setting on a few columns, but the data is missing on the source records. Fix this by updating the pipeline to supply a default, or you can modify the table schema if you expect NULLs to occur and you do not have a reasonable default. For a continuous pipeline, you might expect to see an error like this if something changes in the data source.

Inspect Bad Data

To observe the original data, you can go to the original source file from this example, or you can load the specific message from the bad data topic that was configured to inspect it. Using the kafkacat utility, you can consume the messages on the topic. This command consumes one message on the orders_errors topic.
Shell
kafkacat -C -b 192.0.2.1:9092 -t orders_errors -o -1 -c 1
Output
Shell
{"id": 1011, "user_id": 3, "subtotal": 76.83, "tax": 5.28, "total": 82.11, "discount": null, "created_at": "2017-12-14T11:28:30.031Z", "quantity": 1}
A quick inspection of the source data reveals that this record is missing the product_id field, but this is a required column in our table. If you run this command again, but with the -J flag, you can inspect the headers that are associated with the pipeline. The headers in the bad data target capture important metadata from the data pipeline execution that you can use to troubleshoot the failing data, including the error_message.
Shell
kafkacat -C -b 192.0.2.1:9092 -t orders_errors -o -1 -c 1 -J
Output
SQL
{"topic":"orders_errors","partition":0,"offset":89,"tstype":"create","ts":1713566346480,"broker":1,"headers":["error_index","66","error_code","OLS00","error_message","Failed to load transformed value 'null' to sink after ZCNullIfEmptyTransform. Details: Provided null for column with index = '2' and name = 'product_id', but this column disallows nulls.","source_name","orders-0","record_number","22361935","record_offset","0","field_index","2","column_name","product_id"],"key":null,"payload":"{\"id\": 1011, \"user_id\": 3, \"subtotal\": 76.83, \"tax\": 5.28, \"total\": 82.11, \"discount\": null, \"created_at\": \"2017-12-14T11:28:30.031Z\", \"quantity\": 1}"}

Next Steps

Now that you have loaded data into tables, you can explore a few advanced topics, including handling pipeline errors, loading from other data sources, loading different data formats, and transforming data. Loading Delimited and CSV Data Loading JSON Data Data Types for Data Pipelines Transform Data in Data Pipelines Data Pipeline Reference Data Pipeline Loading Errors
Last modified on May 27, 2026