Skip to main content
You can monitor data pipelines through a combination of:
  1. System catalog tables
  2. Information schema views
  3. Operational metrics endpoints for use with operations monitoring tools
This tutorial describes how to use these system catalog objects to monitor your pipelines:
  • information_schema.pipelines
  • information_schema.pipeline_status
  • sys.pipelines
  • sys.pipeline_events
  • sys.pipeline_errors
  • sys.pipeline_files
  • sys.pipeline_partitions
  • sys.pipeline_metrics
  • sys.pipeline_metrics_info
A complete reference of system catalog tables and views is available in the System Catalog.

Pipelines Information Schema

After you create a pipeline, two convenient views are most useful for observing the pipeline. At a glance, you can view the configuration of your pipeline in information_schema.pipelines and the status information in information_schema.pipeline_status. Access these views conveniently with the SHOW command.
> SHOW PIPELINES;
+-----------------+-----------------+----------------+---------------+---------------+----------------------------+--------------+-------------------+
| database_name   | pipeline_name   | loading_mode   | source_type   | data_format   | created_at                 | created_by   | table_names       |
|-----------------+-----------------+----------------+---------------+---------------+----------------------------+--------------+-------------------|
| test            | my_pipeline     | BATCH          | FILESYSTEM    | CSV           | 2024-02-07 18:01:22.910745 | mac          | ['test.my_table'] |
+-----------------+-----------------+----------------+---------------+---------------+----------------------------+--------------+-------------------+
Fetched 1 rows
SQL
> SHOW PIPELINE_STATUS;
+------------------+---------------+----------------------------+-----------+---------------------------------------------+--------------------+--------------------+-------------------+----------------+-------------------+---------------+---------------------+-------------------+------------------+
| database_name    | pipeline_name | table_names                | status    | status_message                              |   percent_complete |   duration_seconds |   files_processed |   files_failed |   files_remaining |   files_total |   records_processed |   records_loaded  |   records_failed |
|------------------+---------------+----------------------------+-----------+---------------------------------------------+--------------------+--------------------+-------------------+----------------+-------------------+---------------+---------------------+-------------------+------------------|
| test_database    | test_pipeline | ['test_schema.test_table'] | COMPLETED | Completed processing pipeline test_pipeline |                  1 |            10.5066 |                 1 |              0 |                 0 |             1 |                   4 |                 4 |                0 |
+------------------+---------------+----------------------------+-----------+---------------------------------------------+--------------------+--------------------+-------------------+----------------+-------------------+---------------+---------------------+-------------------+------------------+
Fetched 1 rows
For a quick look at the progress of a pipeline, the SHOW PIPELINE_STATUS command provides:
  • Name
  • Target tables
  • Current status
  • Last status message that was received in processing
  • Percentage completion (from 0.0 - 1.0)
  • Duration of pipeline execution
  • Number of files and records processed
  • Error counts
To join these two views to other catalog tables, use the pipeline_name to join to the sys.pipelines table and then join tables using the pipeline_id.

Pipeline Catalog Tables

Beyond the information schema views, many catalog tables offer details about your pipelines.
TableDescriptionImportance
sys.pipelinesList of pipelines that exist and the configuration information.High
sys.pipeline_eventsEvent log generated through the lifecycle of a pipeline.High
sys.pipeline_errorsLog of errors created by pipelines during execution for troubleshooting record-level errors or system errors.High
sys.pipeline_filesList of files that each file-based pipeline is loading or has loaded.High
sys.pipeline_partitionsList of source partitions on each Kafka pipeline with relevant metadata for offsets.Medium
sys.pipeline_metricsMetrics that are regularly collected for each pipeline to capture performance and progress.Medium
sys.pipeline_metrics_infoDescriptive details about the available pipeline metrics. Provides information about how to interpret each metric type.Low

Monitor Activity and Events

While your pipeline is running, the pipeline generates events in the sys.pipeline_events system catalog table to mark significant checkpoints when something has occurred. You can use this table to observe progress and look for detailed messages about changes in the pipeline. For example, the sys.pipeline_events table captures events when a pipeline is created, started, stopped, fails, and completed. This table also captures relevant system events such as rebalancing Kafka partitions or retrying a process due to a transient failure. Pipeline events include messages from many different tasks. These tasks are the background processes that execute across different Loader Nodes during pipeline operation.

Monitor Files or Partitions

For file-based loads, the sys.pipeline_files system catalog table contains one row for each file, which contains the status of the file and other file metadata such as filename, creation or modified timestamps, and the file size. The pipeline updates this list of files when it is started and maintains status information as it processes files. You can use this list to understand which files are loading, which succeeded, and which failed. For Kafka partition-based loads, the sys.pipeline_partitions system catalog table contains one row for each partition, which contains the current offsets and record counts. You can use this information to understand how partitions are distributed across Loader Nodes and also to observe lag on each topic and partition.

Monitor Performance

Pipeline performance metrics are captured in the sys.pipeline_metrics system catalog table. This table contains samples of the metrics over time. The System collects samples regularly. You can query the samples using standard SQL queries to inspect the behavior of a single pipeline, a single task on a Loader Node, or across all pipelines. The sys.pipeline_metrics_info system catalog table provides metadata that explains the individual metric types. See Discover Insights from Data Pipeline System Catalog Tables for a detailed description of how to use sys.pipeline_metrics and sys.pipeline_metric_info to monitor the performance of your pipelines. For the system catalog table definitions, see System Catalog.

Metrics Endpoints

In the same way that the Ocient System monitors system performance using Statistics Monitoring endpoints with operational monitoring tools, data pipelines expose an API that allows operators to capture performance metrics over time.

Access Metrics Endpoints

You query the metrics using an HTTP endpoint on each Loader Node located at <loader_ip_address_or_hostname>:8090/metrics/v1. For example, you can retrieve metrics from a Loader Node by executing this command from a shell on the node or by running a similar request from a monitoring agent on the node.
Text
curl http://localhost:8090/metrics/v1
Each metric is represented as a JSON object that contains:
  • Metric name
  • Metric value
  • Scope information
  • Query time
  • Value units
  • Whether or not the metric is incremental
Metrics can be incremental or instantaneous. Incremental metrics accumulate value over time, so they never decrease in value. Instantaneous metrics capture the current value, so they can decrease.

Metrics Scope

Similar to the sys.pipeline_metrics system catalog table, the Ocient System scopes metrics to a Loader Node, a pipeline task, or a partition. You can determine the scope of a metric based on the presence of the pipeline_name, pipeline_name_external, and partition keys or reference the details in Pipeline Metrics Details.
ScopeDescription
LoaderMetrics scoped to a loader do not have information about pipelines or partitions. Because metrics are collected on each node, this scope contains all activity on the node where you access the endpoint.
PipelineMetrics scoped to a pipeline have a key named pipeline_name with a value that corresponds to the identifier of the extractor task with hyphens removed. The metrics also have a key named pipeline_name_external with a value that corresponds to the identifier of the pipeline.
PartitionMetrics scoped to a partition have the same keys as metrics scoped to a pipeline. The metrics also have a key named partition with a value that is the stream source identifier of the partition.
Examples This JSON structure contains a metric scoped to a Loader Node.
JSON
{
  "name": "usage.os.heap",
  "time": 1713993689613976,
  "timestamp": "2024-04-24T21:21:29.613976119Z",
  "value": 317855288,
  "units": "bytes",
  "incremental": false
}
This JSON structure contains a metric scoped to a pipeline.
JSON
{
  "name": "count.record.sent",
  "time": 1713994532578624,
  "timestamp": "2024-04-24T21:35:32.578624253Z",
  "value": 100000,
  "units": "unitless",
  "incremental": true,
  "pipeline_name_external": "test-rest-external-create",
  "pipeline_name": "test"
}
This JSON structure contains a metric scoped to a partition.
JSON
{
  "name": "duration.record.processed",
  "time": 1714509826724477,
  "timestamp": "2024-04-30T20:43:46.724477142Z",
  "value": 322432,
  "units": "milliseconds",
  "incremental": true,
  "pipeline_name_external": "test-rest-external-create",
  "pipeline_name": "58ff07cf732846f9830b728b3a8e4a7a",
  "partition": "226979",
  "sink_index": "0"
}

Pipeline Metrics Details

NameScopeDescriptionImportance
count.errorpipelineThe count of failures when processing records during pipeline execution for a pipeline.High
count.file.failpipelineThe number of files that have failures.High
count.file.processedpipelineThe number of files that have been extracted from the source for processing. The Ocient System tracks this number before tokenization, transformation, or sending to the sink. This number does not indicate the durability of the files or if files load with errors.Medium
count.file.totalpipelineThe total number of files in the static file list.High
count.partition.durable.offset.committedpartitionThe last committed offset of the Kafka partition for the consumer group.High
count.partition.offset.lagpartitioncount.partition.offset.max - count.partition.durable.offset.committed for a specified Kafka partition.Medium
count.partition.offset.maxpartitionThe maximum offset of the Kafka partition.High
count.partition.records.failedpartitionThe count of failures when processing records during pipeline execution for a specified Kafka partition.High
count.partition.records.processedpartitionThe number of records that have been extracted for processing on a specified Kafka partition.Medium
count.record.bytes.maximumpartitionThe largest record processed in bytes.Medium
count.record.bytes.minimumpartitionThe smallest record processed in bytes.Medium
count.record.bytes.transformedpipelineThe number of bytes that have been transformed, excluding delimiters.Medium
count.record.sentpipelineThe number of records that have been fully transformed and sent to the sink.High
count.sink.bytes.queuedepthpipelineThe number of transformed records queued to send to the sink.Medium
count.stream.pausedpartitionThe number of times the processing of the record stream has been paused due to reaching a high watermark.Medium
count.stream.unpausedpartitionThe number of times the processing of the record stream has been unpaused due to reaching a low watermark.Medium
duration.file.listedpipelineThe number of milliseconds spent listing files from the data source.High
duration.record.durablepartitionThe duration in milliseconds for making records durable.
ℹ️ The database takes time to make records durable, so this duration might be longer than the record processing duration.
Medium
duration.record.processedpartitionThe duration in milliseconds for reading the records from the source, transforming them, and buffering them for sending to the sink.Medium
uptimeLoader NoceThe number of seconds after the last restart of the extraction process.High
usage.os.heapLoader NoceThe number of bytes used in the heap.High
usage.os.percentage.cpuLoader NoceThe percentage of CPU used by the .High
System Catalog
Last modified on May 27, 2026