Set Up System Monitoring with the TIG Stack and Kapacitor

One method to monitor the  is to use the TIG (,  and ) Stack. While these instructions are specifically written for this toolset, you can also use this guide as the foundation to set up an alternate stack if necessary. 
This guide uses InfluxDB, Telegraf, , and Grafana for a complete monitoring solution.
﻿
This guide specifically focuses on host-level and  software metrics. Monitoring additional components (e.g., ) is outside of the scope of this document.
These instructions were created using these versions of each component.
Component
Version
InfluxDB
1.8 (OSS)
Telegraf
1.16.1
Kapacitor
1.5.6 (OSS)
Grafana
7.4.3
Prerequisites
This guide assumes that:
SSH access and root-level privileges are available to each server running Ocient Software.
InfluxDB, Kapacitor, and Grafana are available to be installed in a virtual machine or container. It is important to note that each of these systems should be run independently. For example, do not run InfluxDB, Kapacitor, and Grafana on the same virtual machine.
All Ocient software is currently deployed and running as expected.
Logging configuration is set up as specified in Log Configuration﻿.
The following ports must be open for InfluxDB, Kapacitor, Telegraf, and Grafana. InfluxDB is the most critical as each component communicates with it. Always refer to the latest documentation for each product as the definitive reference.
﻿
Component
Default Ports
Components Requiring Access
InfluxDB
8086 8088 (optional, used for backup utilities)
Telegraf, Kapacitor, Grafana
Telegraf
None
None
Kapacitor
9092
None
Grafana
3000
None
Step 1: Deploy InfluxDB
InfluxDB is a timeseries database that provides persistence for the host-level and Ocient software metrics.
1. Refer to current instructions from InfluxDB on how to install the service:
https://docs.influxdata.com/influxdb/v2.0/install/
https://docs.influxdata.com/influxdb/v2.0/get-started/﻿
2. Document the IP address of the machine where InfluxDB is installed. 
3. Ensure that InfluxDB is running properly.
Shell
/> sudo systemctl status influxdb
influxdb.service - InfluxDB is an open-source, distributed, time series database
Loaded: loaded (/lib/systemd/system/influxdb.service; enabled; vendor preset: enabled)
Active: active (running) since Tue 2021-09-14 17:17:20 UTC; 3 weeks 2 days ago
Docs:
Main PID: 32687 (influxd)
Tasks: 18 (limit: 4915)
CGroup: /system.slice/influxdb.service
└─32687 /usr/bin/influxd -config /etc/influxdb/influxdb.conf/
/> sudo systemctl status influxdb
influxdb.service - InfluxDB is an open-source, distributed, time series database
Loaded: loaded (/lib/systemd/system/influxdb.service; enabled; vendor preset: enabled)
Active: active (running) since Tue 2021-09-14 17:17:20 UTC; 3 weeks 2 days ago
Docs:
Main PID: 32687 (influxd)
Tasks: 18 (limit: 4915)
CGroup: /system.slice/influxdb.service
└─32687 /usr/bin/influxd -config /etc/influxdb/influxdb.conf/
﻿
4. Set the InfluxDB service to start on boot.
Shell
/> sudo systemctl enable influxdb
/> sudo systemctl enable influxdb
﻿
5: It is highly recommended that critical components of monitoring infrastructure, including InfluxDB, are also monitored. Monitoring the monitoring stack is outside of the scope of this document. Refer to available resources on potential solutions to monitor InfluxDB.
Step 2: Deploy and Configure Telegraf
Telegraf is used for metrics collection. It runs on each of the Admin, Loader, Foundation (LTS), SQL Nodes. The following instructions highlight the common configuration elements, followed by the elements that are specific to different node types.
All Nodes
1. Install Telegraf on each of the Admin, Loader, Foundation (LTS), and SQL Nodes. Refer to the Telegraf installation instructions available from InfluxDB. At this point, do not start the Telegraf service or generate the default configuration. 
2. For all nodes, create the Telegraf configuration file (typically located at /etc/telegraf/telegraf.conf). Open the file in an editor.
Copy and paste the following contents into the file.
Replace the designated placeholders (denoted with a $) with the specific values for the given environment:
CLUSTER_NAME = Customer selected name of a given cluster for monitoring identification
ROLE = One of admin, loader, lts, sql, stream_proc depending on the node type
INFLUX_URL = URL of the InfluxDB instance (e.g., http://10.6.0.4:8086)
Text
; /etc/telegraf/telegraf.conf
[global_tags]
   cluster = "$CLUSTER_NAME"
   cluster_role = "$ROLE" # one of admin, loader (aka indexer), lts (aka foundation), sql, stream_proc

[agent]
  interval = "15s" # time
  metric_batch_size = 1000 # size
  metric_buffer_limit = 100000 # size
  collection_jitter = "3s" # time
  flush_interval = "7s" # time
  flush_jitter = "3s" # time
  round_interval = false

[outputs]

[[outputs.influxdb]]
  urls = [ "$INFLUX_URL" ]
  database = "telegraf"
  precision = "1s"
  timeout = "10s"
; /etc/telegraf/telegraf.conf
[global_tags]
   cluster = "$CLUSTER_NAME"
   cluster_role = "$ROLE" # one of admin, loader (aka indexer), lts (aka foundation), sql, stream_proc

[agent]
  interval = "15s" # time
  metric_batch_size = 1000 # size
  metric_buffer_limit = 100000 # size
  collection_jitter = "3s" # time
  flush_interval = "7s" # time
  flush_jitter = "3s" # time
  round_interval = false

[outputs]

[[outputs.influxdb]]
  urls = [ "$INFLUX_URL" ]
  database = "telegraf"
  precision = "1s"
  timeout = "10s"
﻿
Save the file.
3. For all nodes, create a new file under the telegraf.d directory (typically /etc/telegraf/telegraf.d) named host.conf.
Copy and paste the following contents into the file.
Replace the designated placeholders (denoted with an $) with the specific values for the given environment:
BOOT_DISK — Device name of the boot disk (e.g., sda). You can determine the boot disk by running an lsblk command.
NET_INT_BOND0_WILDCARD — The wildcard for matching the underlying network interfaces within bond0 (e.g., eno*). If it is unclear, refer to /proc/net/bonding/bond0 and reference the interfaces noted in the Telegraf documentation.
NET_INT_BOND1_WILDCARD — The wildcard for matching the underlying network interfaces within bond1 (e.g., enp*). If it is unclear, refer to /proc/net/bonding/bond1 and reference the interfaces noted in the Telegraf documentation.
Text
; /etc/telegraf/telegraf.d/host.conf
[[inputs.cpu]]
    percpu = true
    totalcpu = true
    fielddrop = [ "time_*" ]
[[inputs.disk]]
    ignore_fs = ["tmpfs", "devtmpfs", "vfat"]
[[inputs.diskio]]
    skip_serial_number = true
    devices = ["$BOOT_DISK"]
    [inputs.diskio.tags]
        disk_type="os"

[[inputs.mem]]

[[inputs.net]]
    interfaces = ["$NET_INT_BOND0_WILDCARD", "$NET_INT_BOND1_WILDCARD"]

[[inputs.net]]
    interfaces = ["bond0"]
    [inputs.net.tags]
        primary_interface_10Gb = "true"
    [inputs.net.tagdrop]
        interface = [ "all" ]

[[inputs.net]]
    interfaces = ["bond1"]
    [inputs.net.tags]
        primary_interface_100Gb = "true"
    [inputs.net.tagdrop]
        interface = [ "all" ]

[[inputs.system]]
[[inputs.swap]]

[[inputs.linux_sysctl_fs]]
; /etc/telegraf/telegraf.d/host.conf
[[inputs.cpu]]
    percpu = true
    totalcpu = true
    fielddrop = [ "time_*" ]
[[inputs.disk]]
    ignore_fs = ["tmpfs", "devtmpfs", "vfat"]
[[inputs.diskio]]
    skip_serial_number = true
    devices = ["$BOOT_DISK"]
    [inputs.diskio.tags]
        disk_type="os"

[[inputs.mem]]

[[inputs.net]]
    interfaces = ["$NET_INT_BOND0_WILDCARD", "$NET_INT_BOND1_WILDCARD"]

[[inputs.net]]
    interfaces = ["bond0"]
    [inputs.net.tags]
        primary_interface_10Gb = "true"
    [inputs.net.tagdrop]
        interface = [ "all" ]

[[inputs.net]]
    interfaces = ["bond1"]
    [inputs.net.tags]
        primary_interface_100Gb = "true"
    [inputs.net.tagdrop]
        interface = [ "all" ]

[[inputs.system]]
[[inputs.swap]]

[[inputs.linux_sysctl_fs]]
﻿
Save the file.
4. For all nodes, create a new file under the telegraf scripts directory (typically under /etc/telegraf/scripts) named uio_pci_generic.sh.
Copy and paste the following contents into the file.
Shell
; /etc/telegraf/scripts/uio_pci_generic.sh
; A script used to enumerate the PCI devices on the system.
; 
; !/bin/bash

DEVICES=$(find /sys/bus/pci/drivers/uio_pci_generic/0000\:* | awk -F'/' '{ print $NF }')

for d in $DEVICES; do
  echo pci_devices,device=\"$d\" present=1
done
; /etc/telegraf/scripts/uio_pci_generic.sh
; A script used to enumerate the PCI devices on the system.
; 
; !/bin/bash

DEVICES=$(find /sys/bus/pci/drivers/uio_pci_generic/0000\:* | awk -F'/' '{ print $NF }')

for d in $DEVICES; do
  echo pci_devices,device=\"$d\" present=1
done
﻿
Foundation and SQL Nodes
Create a new file under the telegraf.d directory (typically /etc/telegraf/telegraf.d) named rolehostd.conf.
Copy and paste the following contents into the file.
Replace the designated placeholders (denoted with an $) with the specific values for the given environment:
NODE_BOND0_IP_ADDRESS = IP address associated with bond0
Text
[inputs.http]]
  urls = [
    "http://$NODE_BOND0_IP_ADDRESS:9090/v1/status"
  ]
  method = "GET"
  timeout = "5s"
  name_override = "rolehostd_status"
  data_format = "value"
  data_type = "string"

[[inputs.http_response]]
  urls = [
    "http://$NODE_BOND0_IP_ADDRESS:9090/v1/status"
  ]
  method = "GET"
  response_timeout = "5s"
  name_override = "rolehostd_status_response"
  response_string_match = "Active"

[[inputs.http]]
  urls = [
    "http://$NODE_BOND0_IP_ADDRESS:9090/v1/stats"
  ]
  method = "GET"
  timeout = "5s"
  name_override = "rolehostd"
  json_time_key = "time"
  json_time_format = "unix_us"
  json_string_fields = ["units"]
  data_format = "json"
  tagexclude = ["url", "node"]
  tag_keys = [
    "name",
    "socket",
    "device"
  ]

[[inputs.http]]
  urls = [
    "http://$NODE_BOND0_IP_ADDRESS:9090/v1/version"
      ]
  method = "GET"
  timeout = "5s"
  name_override = "rolehostd_version"
  json_query = "{build_type,version}"
  json_string_fields = ["build_type","version"]
  data_format = "json"

[[inputs.http]]
  urls = [
    "http://$NODE_BOND0_IP_ADDRESS:9090/v1/operatorsummary"
  ]
  method = "GET"
  timeout = "5s"
  name_override = "rolehostd_operatorsummary"
  data_format = "json"

[[inputs.exec]]
  commands = ['sh /etc/telegraf/scripts/uio_pci_generic.sh']
  data_format = "influx"

[[inputs.tail]]
    name_override = "tail_query_json"
    files = ["/var/opt/ocient/log/query.json","/var/opt/ocient/query.json"]
    data_format = "json"
    json_query = "{src,msg.user,msg.database,msg.service_class,msg.client_version,msg.client_ip,msg.timestamp_start,msg.timestamp_execstart,msg.timestamp_optimizationcomplete,msg.timestamp_complete,msg.code,msg.priority,msg.runtime,msg.parallelism,msg.cost_estimate,msg.heuristic_cost,msg.rows_returned,msg.bytes_returned,msg.queue_time,msg.optimization_time,msg.default_schema,msg.major_driver_version,msg.minor_driver_version,msg.total_time,msg.resultset_cached,msg.first_byte_time,msg.bytes_per_second_sent,msg.default_schema,msg.major_driver_version,msg.minor_driver_version,msg.total_time,msg.resultset_cached,msg.first_byte_time,msg.bytes_per_second_sent}"
    tag_keys = ["src","user","database","service_class","code","major_driver_version","resultset_cached"]
    json_string_fields = ["client_version","client_ip","timestamp_start","timestamp_execstart","timestamp_complete"]


[[processors.converter]]
    namepass=["tail_query_json"]
    [processors.converter.fields]
        integer = ["timestamp_start","timestamp_execstart","timestamp_complete"]
[inputs.http]]
  urls = [
    "http://$NODE_BOND0_IP_ADDRESS:9090/v1/status"
  ]
  method = "GET"
  timeout = "5s"
  name_override = "rolehostd_status"
  data_format = "value"
  data_type = "string"

[[inputs.http_response]]
  urls = [
    "http://$NODE_BOND0_IP_ADDRESS:9090/v1/status"
  ]
  method = "GET"
  response_timeout = "5s"
  name_override = "rolehostd_status_response"
  response_string_match = "Active"

[[inputs.http]]
  urls = [
    "http://$NODE_BOND0_IP_ADDRESS:9090/v1/stats"
  ]
  method = "GET"
  timeout = "5s"
  name_override = "rolehostd"
  json_time_key = "time"
  json_time_format = "unix_us"
  json_string_fields = ["units"]
  data_format = "json"
  tagexclude = ["url", "node"]
  tag_keys = [
    "name",
    "socket",
    "device"
  ]

[[inputs.http]]
  urls = [
    "http://$NODE_BOND0_IP_ADDRESS:9090/v1/version"
      ]
  method = "GET"
  timeout = "5s"
  name_override = "rolehostd_version"
  json_query = "{build_type,version}"
  json_string_fields = ["build_type","version"]
  data_format = "json"

[[inputs.http]]
  urls = [
    "http://$NODE_BOND0_IP_ADDRESS:9090/v1/operatorsummary"
  ]
  method = "GET"
  timeout = "5s"
  name_override = "rolehostd_operatorsummary"
  data_format = "json"

[[inputs.exec]]
  commands = ['sh /etc/telegraf/scripts/uio_pci_generic.sh']
  data_format = "influx"

[[inputs.tail]]
    name_override = "tail_query_json"
    files = ["/var/opt/ocient/log/query.json","/var/opt/ocient/query.json"]
    data_format = "json"
    json_query = "{src,msg.user,msg.database,msg.service_class,msg.client_version,msg.client_ip,msg.timestamp_start,msg.timestamp_execstart,msg.timestamp_optimizationcomplete,msg.timestamp_complete,msg.code,msg.priority,msg.runtime,msg.parallelism,msg.cost_estimate,msg.heuristic_cost,msg.rows_returned,msg.bytes_returned,msg.queue_time,msg.optimization_time,msg.default_schema,msg.major_driver_version,msg.minor_driver_version,msg.total_time,msg.resultset_cached,msg.first_byte_time,msg.bytes_per_second_sent,msg.default_schema,msg.major_driver_version,msg.minor_driver_version,msg.total_time,msg.resultset_cached,msg.first_byte_time,msg.bytes_per_second_sent}"
    tag_keys = ["src","user","database","service_class","code","major_driver_version","resultset_cached"]
    json_string_fields = ["client_version","client_ip","timestamp_start","timestamp_execstart","timestamp_complete"]


[[processors.converter]]
    namepass=["tail_query_json"]
    [processors.converter.fields]
        integer = ["timestamp_start","timestamp_execstart","timestamp_complete"]
﻿
You can also use Filebeat to forward the query.json file to a monitoring platform of your choice.
Admin and Loader Nodes
Create a new file under the telegraf.d directory (typically /etc/telegraf/telegraf.d) named rolehostd.conf.
Copy and paste the following contents into the file.
Replace the designated placeholders (denoted with an $) with the specific values for the given environment:
NODE_BOND0_IP_ADDRESS = IP address associated with bond0
Text
; /etc/telegraf/telegraf.d/rolehostd.conf
[[inputs.http]]
  urls = [
    "http://$NODE_BOND0_IP_ADDRESS:9090/v1/status"
  ]
  method = "GET"
  timeout = "5s"
  name_override = "rolehostd_status"
  data_format = "value"
  data_type = "string"

[[inputs.http_response]]
  urls = [
    "http://$NODE_BOND0_IP_ADDRESS:9090/v1/status"
  ]
  method = "GET"
  response_timeout = "5s"
  name_override = "rolehostd_status_response"
  response_string_match = "Active"

[[inputs.http]]
  urls = [
    "http://$NODE_BOND0_IP_ADDRESS:9090/v1/stats"
  ]
  method = "GET"
  timeout = "5s"
  name_override = "rolehostd"
  json_time_key = "time"
  json_time_format = "unix_us"
  json_string_fields = ["units"]
  data_format = "json"
  tagexclude = ["url", "node"]
  tag_keys = [
    "name",
    "socket",
    "device"
  ]

[[inputs.http]]
  urls = [
    "http://$NODE_BOND0_IP_ADDRESS:9090/v1/version"
  ]
  method = "GET"
  timeout = "5s"
  name_override = "rolehostd_version"
  json_query = "{build_type,version}"
  json_string_fields = ["build_type","version"]
  data_format = "json"

[[inputs.exec]]  commands = ['sh /etc/telegraf/scripts/uio_pci_generic.sh']
  data_format = "influx"

[[inputs.tail]]
    name_override = "tail_query_json"
    files = ["/var/opt/ocient/log/query.json","/var/opt/ocient/query.json"]
    data_format = "json"
    json_query = "{src,msg.user,msg.database,msg.service_class,msg.client_version,msg.client_ip,msg.timestamp_start,msg.timestamp_execstart,msg.timestamp_optimizationcomplete,msg.timestamp_complete,msg.code,msg.priority,msg.runtime,msg.parallelism,msg.cost_estimate,msg.heuristic_cost,msg.rows_returned,msg.bytes_returned,msg.queue_time,msg.optimization_time,msg.default_schema,msg.major_driver_version,msg.minor_driver_version,msg.total_time,msg.resultset_cached,msg.first_byte_time,msg.bytes_per_second_sent,msg.default_schema,msg.major_driver_version,msg.minor_driver_version,msg.total_time,msg.resultset_cached,msg.first_byte_time,msg.bytes_per_second_sent}"
    tag_keys = ["src","user","database","service_class","code","major_driver_version","resultset_cached"]
    json_string_fields = ["client_version","client_ip","timestamp_start","timestamp_execstart","timestamp_complete"]


[[processors.converter]]
    namepass=["tail_query_json"]
    [processors.converter.fields]
        integer = ["timestamp_start","timestamp_execstart","timestamp_complete"]
; /etc/telegraf/telegraf.d/rolehostd.conf
[[inputs.http]]
  urls = [
    "http://$NODE_BOND0_IP_ADDRESS:9090/v1/status"
  ]
  method = "GET"
  timeout = "5s"
  name_override = "rolehostd_status"
  data_format = "value"
  data_type = "string"

[[inputs.http_response]]
  urls = [
    "http://$NODE_BOND0_IP_ADDRESS:9090/v1/status"
  ]
  method = "GET"
  response_timeout = "5s"
  name_override = "rolehostd_status_response"
  response_string_match = "Active"

[[inputs.http]]
  urls = [
    "http://$NODE_BOND0_IP_ADDRESS:9090/v1/stats"
  ]
  method = "GET"
  timeout = "5s"
  name_override = "rolehostd"
  json_time_key = "time"
  json_time_format = "unix_us"
  json_string_fields = ["units"]
  data_format = "json"
  tagexclude = ["url", "node"]
  tag_keys = [
    "name",
    "socket",
    "device"
  ]

[[inputs.http]]
  urls = [
    "http://$NODE_BOND0_IP_ADDRESS:9090/v1/version"
  ]
  method = "GET"
  timeout = "5s"
  name_override = "rolehostd_version"
  json_query = "{build_type,version}"
  json_string_fields = ["build_type","version"]
  data_format = "json"

[[inputs.exec]]  commands = ['sh /etc/telegraf/scripts/uio_pci_generic.sh']
  data_format = "influx"

[[inputs.tail]]
    name_override = "tail_query_json"
    files = ["/var/opt/ocient/log/query.json","/var/opt/ocient/query.json"]
    data_format = "json"
    json_query = "{src,msg.user,msg.database,msg.service_class,msg.client_version,msg.client_ip,msg.timestamp_start,msg.timestamp_execstart,msg.timestamp_optimizationcomplete,msg.timestamp_complete,msg.code,msg.priority,msg.runtime,msg.parallelism,msg.cost_estimate,msg.heuristic_cost,msg.rows_returned,msg.bytes_returned,msg.queue_time,msg.optimization_time,msg.default_schema,msg.major_driver_version,msg.minor_driver_version,msg.total_time,msg.resultset_cached,msg.first_byte_time,msg.bytes_per_second_sent,msg.default_schema,msg.major_driver_version,msg.minor_driver_version,msg.total_time,msg.resultset_cached,msg.first_byte_time,msg.bytes_per_second_sent}"
    tag_keys = ["src","user","database","service_class","code","major_driver_version","resultset_cached"]
    json_string_fields = ["client_version","client_ip","timestamp_start","timestamp_execstart","timestamp_complete"]


[[processors.converter]]
    namepass=["tail_query_json"]
    [processors.converter.fields]
        integer = ["timestamp_start","timestamp_execstart","timestamp_complete"]
﻿
Loader Nodes
Loader Nodes also run the Loading and Transformation service and require additional Telegraf configuration to capture those metrics.
Create a new file under the telegraf.d directory (typically /etc/telegraf/telegraf.d) named lat.conf.
Copy and paste the following contents into the file.
Text
; /etc/telegraf/telegraf.d/lat.conf
[[inputs.jolokia2_agent]]
  urls = ["http://localhost:8080/v2/metrics"]

  [[inputs.jolokia2_agent.metric]]
    name  = "pipeline"
    mbean = "lat:type=pipeline"

  [[inputs.jolokia2_agent.metric]]
    name  = "partitions"
    mbean = "lat:type=partitions"

[[inputs.procstat]]
  systemd_unit = "lat.service"
; /etc/telegraf/telegraf.d/lat.conf
[[inputs.jolokia2_agent]]
  urls = ["http://localhost:8080/v2/metrics"]

  [[inputs.jolokia2_agent.metric]]
    name  = "pipeline"
    mbean = "lat:type=pipeline"

  [[inputs.jolokia2_agent.metric]]
    name  = "partitions"
    mbean = "lat:type=partitions"

[[inputs.procstat]]
  systemd_unit = "lat.service"
﻿
Start Telegraf
When the configuration is properly entered on all nodes, Telegraf must be started.
1. On each node, start the Telegraf service.
Shell
/> sudo systemctl start telegraf
/> sudo systemctl start telegraf
﻿
2. Ensure the service came up properly by checking the status.
Shell
/> sudo systemctl status telegraf
telegraf.service - The plugin-driven server agent for reporting metrics into InfluxDB
Loaded: loaded (/usr/lib/systemd/system/telegraf.service; enabled; vendor preset: disabled)
Active: active (running) since Thu 2021-08-05 22:04:06 UTC; 2 months 3 days ago
Docs: https://github.com/influxdata/telegraf
Main PID: 2651775 (telegraf)
Tasks: 103 (limit: 1589476)
Memory: 221.2M
CGroup: /system.slice/telegraf.service
└─2651775 /usr/bin/telegraf -config /etc/telegraf/telegraf.conf -config-directory /etc/telegraf/telegraf.d
/> sudo systemctl status telegraf
telegraf.service - The plugin-driven server agent for reporting metrics into InfluxDB
Loaded: loaded (/usr/lib/systemd/system/telegraf.service; enabled; vendor preset: disabled)
Active: active (running) since Thu 2021-08-05 22:04:06 UTC; 2 months 3 days ago
Docs: https://github.com/influxdata/telegraf
Main PID: 2651775 (telegraf)
Tasks: 103 (limit: 1589476)
Memory: 221.2M
CGroup: /system.slice/telegraf.service
└─2651775 /usr/bin/telegraf -config /etc/telegraf/telegraf.conf -config-directory /etc/telegraf/telegraf.d
﻿
3. Set the Telegraf service to start on boot.
Shell
/> sudo systemctl enable telegraf
/> sudo systemctl enable telegraf
﻿
Step 3: Deploy and Configure Kapacitor
Kapacitor is used to process time series data, detect anomalies, and trigger alerts. As the preferred alerting mechanisms for every customer will vary, refer to the Kapacitor documentation on how to send alerts through specific tools.
1. Install Kapacitor using the instructions from InfluxDB.
https://docs.influxdata.com/kapacitor/v1.6/introduction/installation/﻿
2. Create the Kapacitor configuration file (typically located at /etc/kapacitor/kapacitor.conf).
Open the file in an editor.
Copy and paste the following contents into the file.
Replace the designated placeholders (denoted with an $) with the specific values for the given environment:
KAPACITOR_HOST_OR_IP = Hostname or IP address associated with the Kapacitor instance 
INFLUX_URL = URL of the InfluxDB instance (e.g., http://10.6.0.4:8086)
Text
; /etc/kapacitor/kapacitor.conf
hostname = "$KAPACITOR_HOST_OR_IP"
data_dir = "/var/lib/kapacitor"
skip-config-overrides = false

[logging]
    # Destination for logs
    # Can be a path to a file or 'STDOUT', 'STDERR'.
    file = "/var/log/kapacitor/kapacitor.log"
    level = "ERROR"

[load]
  enabled = true
  dir = "/etc/kapacitor/load"

[replay]
  # Where to store replay files, aka recordings.
  dir = "/var/lib/kapacitor/replay"

[storage]
  boltdb = "/var/lib/kapacitor/kapacitor.db"

[[influxdb]]
  enabled = true
  default = true
  name = "influx"
  urls = ["$INFLUX_URL"]
  username = "admin"
  password = ""
  timeout = "10s"

  startup-timeout = "60s"

  # Subscription mode is either "cluster" or "server"
  subscription-mode = "server"

  # Which protocol to use for subscriptions
  # one of 'udp', 'http', or 'https'.
    subscription-protocol = "http"

  # Subscriptions resync time interval
  # Useful if you want to subscribe to new created databases
  # without restart Kapacitord
  subscriptions-sync-interval = "1m0s"

; INSERT ALERT DESTINATION CONFIGS HERE

[stats]
  enabled = true
  stats-interval = "10s"
  database = "_kapacitor"
  retention-policy= "autogen"
; /etc/kapacitor/kapacitor.conf
hostname = "$KAPACITOR_HOST_OR_IP"
data_dir = "/var/lib/kapacitor"
skip-config-overrides = false

[logging]
    # Destination for logs
    # Can be a path to a file or 'STDOUT', 'STDERR'.
    file = "/var/log/kapacitor/kapacitor.log"
    level = "ERROR"

[load]
  enabled = true
  dir = "/etc/kapacitor/load"

[replay]
  # Where to store replay files, aka recordings.
  dir = "/var/lib/kapacitor/replay"

[storage]
  boltdb = "/var/lib/kapacitor/kapacitor.db"

[[influxdb]]
  enabled = true
  default = true
  name = "influx"
  urls = ["$INFLUX_URL"]
  username = "admin"
  password = ""
  timeout = "10s"

  startup-timeout = "60s"

  # Subscription mode is either "cluster" or "server"
  subscription-mode = "server"

  # Which protocol to use for subscriptions
  # one of 'udp', 'http', or 'https'.
    subscription-protocol = "http"

  # Subscriptions resync time interval
  # Useful if you want to subscribe to new created databases
  # without restart Kapacitord
  subscriptions-sync-interval = "1m0s"

; INSERT ALERT DESTINATION CONFIGS HERE

[stats]
  enabled = true
  stats-interval = "10s"
  database = "_kapacitor"
  retention-policy= "autogen"
﻿
Add Kapacitor Alerts and Templates
Ocient provides sample Kapacitor alerts and templates upon request. Contact Ocient Support for details.
Within the bundle, relevant contents are located within the kapacitor directory.
Copy the files located under kapacitor/load/tasks to the Kapacitor system in /etc/kapacitor/load/tasks.
Copy the files located under kapacitor/load/templates to the Kapacitor system in /etc/kapacitor/load/templates.
Start Kapacitor
When the configuration is complete, Kapacitor must be started.
1. On each node, start the Kapacitor service.
Shell
/> sudo systemctl start kapacitor
/> sudo systemctl start kapacitor
﻿
2. Ensure the service came up properly by checking the status.
Shell
/> sudo systemctl status kapacitor
kapacitor.service - Time series data processing engine.
   Loaded: loaded (/lib/systemd/system/kapacitor.service; enabled; vendor preset: enabled)
   Active: active (running) since Fri 2021-09-03 18:30:37 UTC; 1 months 9 days ago
     Docs: https://github.com/influxdb/kapacitor
 Main PID: 16177 (kapacitord)
    Tasks: 16 (limit: 4915)
   CGroup: /system.slice/kapacitor.service
           └─16177 /usr/bin/kapacitord -config /etc/kapacitor/kapacitor.conf

Sep 03 18:30:37 kapacitor systemd[1]: Started Time series data processing engine..
Sep 03 18:30:37 kapacitor kapacitord[16177]: '##:::'##::::'###::::'########:::::'###:::::'######::'####:'########::'#######::'########::
Sep 03 18:30:37 kapacitor kapacitord[16177]:  ##::'##::::'## ##::: ##.... ##:::'## ##:::'##... ##:. ##::... ##..::'##.... ##: ##.... ##:
Sep 03 18:30:37 kapacitor kapacitord[16177]:  ##:'##::::'##:. ##:: ##:::: ##::'##:. ##:: ##:::..::: ##::::: ##:::: ##:::: ##: ##:::: ##:
Sep 03 18:30:37 kapacitor kapacitord[16177]:  #####::::'##:::. ##: ########::'##:::. ##: ##:::::::: ##::::: ##:::: ##:::: ##: ########::
Sep 03 18:30:37 kapacitor kapacitord[16177]:  ##. ##::: #########: ##.....::: #########: ##:::::::: ##::::: ##:::: ##:::: ##: ##.. ##:::
Sep 03 18:30:37 kapacitor kapacitord[16177]:  ##:. ##:: ##.... ##: ##:::::::: ##.... ##: ##::: ##:: ##::::: ##:::: ##:::: ##: ##::. ##::
Sep 03 18:30:37 kapacitor kapacitord[16177]:  ##::. ##: ##:::: ##: ##:::::::: ##:::: ##:. ######::'####:::: ##::::. #######:: ##:::. ##:
Sep 03 18:30:37 kapacitor kapacitord[16177]: ..::::..::..:::::..::..:::::::::..:::::..:::......:::....:::::..::::::.......:::..:::::..::
Sep 03 18:30:37 kapacitor kapacitord[16177]: 2021/09/03 18:30:37 Using configuration at: /etc/kapacitor/kapacitor.conf
/> sudo systemctl status kapacitor
kapacitor.service - Time series data processing engine.
   Loaded: loaded (/lib/systemd/system/kapacitor.service; enabled; vendor preset: enabled)
   Active: active (running) since Fri 2021-09-03 18:30:37 UTC; 1 months 9 days ago
     Docs: https://github.com/influxdb/kapacitor
 Main PID: 16177 (kapacitord)
    Tasks: 16 (limit: 4915)
   CGroup: /system.slice/kapacitor.service
           └─16177 /usr/bin/kapacitord -config /etc/kapacitor/kapacitor.conf

Sep 03 18:30:37 kapacitor systemd[1]: Started Time series data processing engine..
Sep 03 18:30:37 kapacitor kapacitord[16177]: '##:::'##::::'###::::'########:::::'###:::::'######::'####:'########::'#######::'########::
Sep 03 18:30:37 kapacitor kapacitord[16177]:  ##::'##::::'## ##::: ##.... ##:::'## ##:::'##... ##:. ##::... ##..::'##.... ##: ##.... ##:
Sep 03 18:30:37 kapacitor kapacitord[16177]:  ##:'##::::'##:. ##:: ##:::: ##::'##:. ##:: ##:::..::: ##::::: ##:::: ##:::: ##: ##:::: ##:
Sep 03 18:30:37 kapacitor kapacitord[16177]:  #####::::'##:::. ##: ########::'##:::. ##: ##:::::::: ##::::: ##:::: ##:::: ##: ########::
Sep 03 18:30:37 kapacitor kapacitord[16177]:  ##. ##::: #########: ##.....::: #########: ##:::::::: ##::::: ##:::: ##:::: ##: ##.. ##:::
Sep 03 18:30:37 kapacitor kapacitord[16177]:  ##:. ##:: ##.... ##: ##:::::::: ##.... ##: ##::: ##:: ##::::: ##:::: ##:::: ##: ##::. ##::
Sep 03 18:30:37 kapacitor kapacitord[16177]:  ##::. ##: ##:::: ##: ##:::::::: ##:::: ##:. ######::'####:::: ##::::. #######:: ##:::. ##:
Sep 03 18:30:37 kapacitor kapacitord[16177]: ..::::..::..:::::..::..:::::::::..:::::..:::......:::....:::::..::::::.......:::..:::::..::
Sep 03 18:30:37 kapacitor kapacitord[16177]: 2021/09/03 18:30:37 Using configuration at: /etc/kapacitor/kapacitor.conf
﻿
3. Set the Kapacitor service to start on boot.
Shell
/> sudo systemctl enable kapacitor
/> sudo systemctl enable kapacitor
﻿
Step 4: Deploy and Configure Grafana
Grafana provides the dashboard and visualization capability for metrics. This tool is used as opposed to , which is traditionally used to complete the TICK (Telegraf/InfluxDB/Chronograf/Kapacitor) stack. Grafana is favored over Chronograf due to its versatility beyond the InfluxDB ecosystem, as well as higher frequency of contributions and updates from the broader community.
Install Grafana. Follow the Grafana instructions to ensure that configuration options for items such as users, permissions, and logging all correspond to the environment. 
https://grafana.com/docs/grafana/latest/installation/﻿
Add InfluxDB as a Data Source
1. Open Grafana in a web browser.
2. In the Grafana UI, click the Settings tab on the left side of the screen. Select Configuration->Data Sources.
﻿
3. Click Add data source.
4. Under the data source list, select InfluxDB. Click the Select button.
﻿
5. Enter all of the pertinent configuration details associated with the InfluxDB instance:
Ensure the data source is marked as the Default.
Under the HTTP section, the URL should be the same as the INFLUX_URL used in previous sections.
The query language should be set to the default (InfluxQL).
﻿
6. Under the InfluxDB Details section, specify telegraf as the database name.
﻿
7. When the configuration appears to be correct, click Save and Test.
8. A green banner should appear that indicates the connection to the data source is working properly. If this message does not display, consult the Grafana documentation or contact Ocient Support.
﻿
Import Dashboards
Ocient provides sample Grafana dashboards upon request. Contact Ocient Support for details.
Within the bundle, relevant contents are located within the grafana directory.
Grafana only allows one file at a time.
Open Grafana in a web browser.
In the Grafana UI, click the + ("Create") tab on the left side of the screen. Select the Import option.
On the Import page, click the Upload JSON File button.
Select the file representing the dashboard you wish to import. 
On the dashboard summary page, verify the settings are correct. Click Import to complete the process.
Repeat steps 3-6 for each dashboard that needs to be added.
Related Links
﻿Error Monitoring﻿﻿
﻿Statistics Monitoring﻿﻿
﻿
Filebeat is a trademark of Elasticsearch BV.
Component	Version
InfluxDB	1.8 (OSS)
Telegraf	1.16.1
Kapacitor	1.5.6 (OSS)
Grafana	7.4.3
Component	Default Ports	Components Requiring Access
InfluxDB	8086 8088 (optional, used for backup utilities)	Telegraf, Kapacitor, Grafana
Telegraf	None	None
Kapacitor	9092	None
Grafana	3000	None
Updated 22 Apr 2024
Did this page help you?
System Information REST Endpoints
Monitor Segment Group Transfers