Load Data
Monitor Data Pipelines
you can monitor data pipelines through a combination of system catalog tables information schema views operational metrics endpoints for use with operations monitoring tools this tutorial describes how to use these system catalog objects to monitor your pipelines information schema pipelines information schema pipeline status sys pipelines sys pipeline events sys pipeline errors sys pipeline files sys pipeline partitions sys pipeline metrics sys pipeline metrics info a complete reference of system catalog tables and views is available in the system catalog docid 2zcc9xuscejvt5v ihgy6 pipelines information schema after you create a pipeline, two convenient views are most useful for observing the pipeline at a glance, you can view the configuration of your pipeline in information schema pipelines and the status information in information schema pipeline status access these views conveniently with the show command sql > show pipelines; + + + + + + + + + \| database name | pipeline name | loading mode | source type | data format | created at | created by | table names | \| + + + + + + + | \| test | my pipeline | batch | filesystem | csv | 2024 02 07 18 01 22 910745 | mac | \['test my table'] | + + + + + + + + + fetched 1 rows \> show pipeline status; + + + + + + + + + + + + + + + \| database name | pipeline name | table names | status | status message | percent complete | duration seconds | files processed | files failed | files remaining | files total | records processed | records loaded | records failed | \| + + + + + + + + + + + + + | \| test database | test pipeline | \['test schema test table'] | completed | completed processing pipeline test pipeline | 1 | 10 5066 | 1 | 0 | 0 | 1 | 4 | 4 | 0 | + + + + + + + + + + + + + + + fetched 1 rows for a quick look at the progress of a pipeline, the show pipeline status command provides name target tables current status last status message that was received in processing percentage completion (from 0 0 1 0) duration of pipeline execution number of files and records processed error counts to join these two views to other catalog tables, use the pipeline name to join to the sys pipelines table and then join tables using the pipeline id pipeline catalog tables beyond the information schema views, many catalog tables offer details about your pipelines true false 213,344false unhandled content type false unhandled content type false unhandled content type false unhandled content type false unhandled content type false unhandled content type false unhandled content type false unhandled content type false unhandled content type false unhandled content type false unhandled content type false unhandled content type false unhandled content type false unhandled content type false unhandled content type false unhandled content type false unhandled content type false unhandled content type false unhandled content type false unhandled content type false unhandled content type false unhandled content type false unhandled content type false unhandled content type monitor activity and events while your pipeline is running, the pipeline generates events in the sys pipeline events system catalog table to mark significant checkpoints when something has occurred you can use this table to observe progress and look for detailed messages about changes in the pipeline for example, the sys pipeline events table captures events when a pipeline is created, started, stopped, fails, and completed this table also captures relevant system events such as rebalancing kafka partitions or retrying a process due to a transient failure pipeline events include messages from many different tasks these tasks are the background processes that execute across different loader nodes during pipeline operation monitor files or partitions for file based loads, the sys pipeline files system catalog table contains one row for each file, which contains the status of the file and other file metadata such as filename, creation or modified timestamps, and the file size the pipeline updates this list of files when it is started and maintains status information as it processes files you can use this list to understand which files are loading, which succeeded, and which failed for kafka partition based loads, the sys pipeline partitions system catalog table contains one row for each partition, which contains the current offsets and record counts you can use this information to understand how partitions are distributed across loader nodes and also to observe lag on each topic and partition monitor performance pipeline performance metrics are captured in the sys pipeline metrics system catalog table this table contains samples of the metrics over time the {{ocient}} system collects samples regularly you can query the samples using standard sql queries to inspect the behavior of a single pipeline, a single task on a loader node, or across all pipelines the sys pipeline metrics info system catalog table provides metadata that explains the individual metric types see discover insights from system catalog tables docid szkiqnuxled51xsbjf3w for a detailed description of how to use sys pipeline metrics and sys pipeline metric info to monitor the performance of your pipelines for the system catalog table definitions, see system catalog metrics endpoints in the same way that the ocient system monitors system performance using statistics monitoring docid\ jtynlpc rgdksxytttfyh endpoints with operational monitoring tools, data pipelines expose an api that allows operators to capture performance metrics over time access metrics endpoints you query the metrics using an http endpoint on each loader node located at \<loader ip address or hostname> 8090/metrics/v1 for example, you can retrieve metrics from a loader node by executing this command from a shell on the node or by running a similar request from a monitoring agent on the node curl http //localhost 8090/metrics/v1 each metric is represented as a json object that contains metric name metric value scope information query time value units whether or not the metric is incremental metrics can be incremental or instantaneous incremental metrics accumulate value over time, so they never decrease in value instantaneous metrics capture the current value, so they can decrease metrics scope similar to the sys pipeline metrics system catalog table, the ocient system scopes metrics to a loader node, a pipeline task, or a partition you can determine the scope of a metric based on the presence of the pipeline name , pipeline name external , and partition keys or reference the details in monitor data pipelines /#pipeline metrics details true falsefalse unhandled content type false unhandled content type false unhandled content type false unhandled content type false unhandled content type false unhandled content type false unhandled content type false unhandled content type examples this json structure contains a metric scoped to a loader node { "name" "usage os heap", "time" 1713993689613976, "timestamp" "2024 04 24t21 21 29 613976119z", "value" 317855288, "units" "bytes", "incremental" false } this json structure contains a metric scoped to a pipeline { "name" "count record sent", "time" 1713994532578624, "timestamp" "2024 04 24t21 35 32 578624253z", "value" 100000, "units" "unitless", "incremental" true, "pipeline name external" "test rest external create", "pipeline name" "test" } this json structure contains a metric scoped to a partition { "name" "duration record processed", "time" 1714509826724477, "timestamp" "2024 04 30t20 43 46 724477142z", "value" 322432, "units" "milliseconds", "incremental" true, "pipeline name external" "test rest external create", "pipeline name" "58ff07cf732846f9830b728b3a8e4a7a", "partition" "226979", "sink index" "0" } pipeline metrics details true false 0,0,323false unhandled content type false unhandled content type false unhandled content type false unhandled content type false unhandled content type false unhandled content type false unhandled content type false unhandled content type false unhandled content type false unhandled content type false unhandled content type false unhandled content type false unhandled content type false unhandled content type false unhandled content type false unhandled content type false unhandled content type false unhandled content type false unhandled content type false unhandled content type false unhandled content type false unhandled content type false unhandled content type false unhandled content type false unhandled content type false unhandled content type false unhandled content type false unhandled content type false unhandled content type false unhandled content type false unhandled content type false unhandled content type false unhandled content type false unhandled content type false unhandled content type false unhandled content type false unhandled content type false unhandled content type false unhandled content type false unhandled content type false unhandled content type false unhandled content type false unhandled content type false unhandled content type false unhandled content type false unhandled content type false unhandled content type false unhandled content type false unhandled content type false unhandled content type false unhandled content type false unhandled content type false unhandled content type false unhandled content type false unhandled content type false unhandled content type false unhandled content type false unhandled content type false unhandled content type false unhandled content type false unhandled content type false unhandled content type false unhandled content type false unhandled content type false unhandled content type false unhandled content type false unhandled content type false unhandled content type false unhandled content type false unhandled content type false unhandled content type false unhandled content type false unhandled content type false unhandled content type false unhandled content type false unhandled content type false unhandled content type false unhandled content type false unhandled content type false unhandled content type false unhandled content type false unhandled content type false unhandled content type false unhandled content type false unhandled content type false unhandled content type false unhandled content type false unhandled content type false unhandled content type false unhandled content type false unhandled content type false unhandled content type related links system catalog