System Administration
Monitoring

Set Up System Monitoring with the TIG Stack and Kapacitor

One method to monitor the

 is to use the TIG (

,

and

) Stack. While these instructions are specifically written for this toolset, you can also use this guide as the foundation to set up an alternate stack if necessary.

This guide uses InfluxDB, Telegraf,

, and Grafana for a complete monitoring solution.

Monitoring example with Telegraf data collection into InfluxDB with Grafana dashboard and Kapacitor alerts


This guide specifically focuses on host-level and

 software metrics. Monitoring additional components (e.g.,

) is outside of the scope of this document.

These instructions were created using these versions of each component.

Component

Version

InfluxDB

1.8 (OSS)

Telegraf

1.16.1

Kapacitor

1.5.6 (OSS)

Grafana

7.4.3

Prerequisites

This guide assumes that:

  1. SSH access and root-level privileges are available to each server running Ocient Software.
  2. InfluxDB, Kapacitor, and Grafana are available to be installed in a virtual machine or container. It is important to note that each of these systems should be run independently. For example, do not run InfluxDB, Kapacitor, and Grafana on the same virtual machine.
  3. All Ocient software is currently deployed and running as expected.
  4. Logging configuration is set up as specified in Log Configuration.
  5. The following ports must be open for InfluxDB, Kapacitor, Telegraf, and Grafana. InfluxDB is the most critical as each component communicates with it. Always refer to the latest documentation for each product as the definitive reference.



Component

Default Ports

Components Requiring Access

InfluxDB

8086 8088 (optional, used for backup utilities)

Telegraf, Kapacitor, Grafana

Telegraf

None

None

Kapacitor

9092

None

Grafana

3000

None

Step 1: Deploy InfluxDB

InfluxDB is a timeseries database that provides persistence for the host-level and Ocient software metrics.

1. Refer to current instructions from InfluxDB on how to install the service: https://docs.influxdata.com/influxdb/v2.0/install/ https://docs.influxdata.com/influxdb/v2.0/get-started/

2. Document the IP address of the machine where InfluxDB is installed.

3. Ensure that InfluxDB is running properly.

Shell


4. Set the InfluxDB service to start on boot.

Shell


5: It is highly recommended that critical components of monitoring infrastructure, including InfluxDB, are also monitored. Monitoring the monitoring stack is outside of the scope of this document. Refer to available resources on potential solutions to monitor InfluxDB.

Step 2: Deploy and Configure Telegraf

Telegraf is used for metrics collection. It runs on each of the Admin, Loader, Foundation (LTS), SQL Nodes. The following instructions highlight the common configuration elements, followed by the elements that are specific to different node types.

All Nodes

1. Install Telegraf on each of the Admin, Loader, Foundation (LTS), and SQL Nodes. Refer to the Telegraf installation instructions available from InfluxDB. At this point, do not start the Telegraf service or generate the default configuration.

2. For all nodes, create the Telegraf configuration file (typically located at /etc/telegraf/telegraf.conf). Open the file in an editor.

  • Copy and paste the following contents into the file.
  • Replace the designated placeholders (denoted with a $) with the specific values for the given environment:
    • CLUSTER_NAME = Customer selected name of a given cluster for monitoring identification
    • ROLE = One of admin, loader, lts, sql, stream_proc depending on the node type
    • INFLUX_URL = URL of the InfluxDB instance (e.g., http://10.6.0.4:8086)
Text


Save the file.

3. For all nodes, create a new file under the telegraf.d directory (typically /etc/telegraf/telegraf.d) named host.conf.

  • Copy and paste the following contents into the file.
  • Replace the designated placeholders (denoted with an $) with the specific values for the given environment:
    • BOOT_DISK Device name of the boot disk (e.g., sda). You can determine the boot disk by running an lsblk command.
    • NET_INT_BOND0_WILDCARD The wildcard for matching the underlying network interfaces within bond0 (e.g., eno*). If it is unclear, refer to /proc/net/bonding/bond0 and reference the interfaces noted in the Telegraf documentation.
    • NET_INT_BOND1_WILDCARD The wildcard for matching the underlying network interfaces within bond1 (e.g., enp*). If it is unclear, refer to /proc/net/bonding/bond1 and reference the interfaces noted in the Telegraf documentation.
Text


Save the file.

4. For all nodes, create a new file under the telegraf scripts directory (typically under /etc/telegraf/scripts) named uio_pci_generic.sh.

  • Copy and paste the following contents into the file.
Shell


Foundation and SQL Nodes

Create a new file under the telegraf.d directory (typically /etc/telegraf/telegraf.d) named rolehostd.conf.

  • Copy and paste the following contents into the file.
  • Replace the designated placeholders (denoted with an $) with the specific values for the given environment:
    • NODE_BOND0_IP_ADDRESS = IP address associated with bond0
Text


You can also use Filebeat to forward the query.json file to a monitoring platform of your choice.

Admin and Loader Nodes

Create a new file under the telegraf.d directory (typically /etc/telegraf/telegraf.d) named rolehostd.conf.

  • Copy and paste the following contents into the file.
  • Replace the designated placeholders (denoted with an $) with the specific values for the given environment:
    • NODE_BOND0_IP_ADDRESS = IP address associated with bond0
Text


Loader Nodes

Loader Nodes also run the Loading and Transformation service and require additional Telegraf configuration to capture those metrics.

  • Create a new file under the telegraf.d directory (typically /etc/telegraf/telegraf.d) named lat.conf.
  • Copy and paste the following contents into the file.
Text


Start Telegraf

When the configuration is properly entered on all nodes, Telegraf must be started.

1. On each node, start the Telegraf service.

Shell


2. Ensure the service came up properly by checking the status.

Shell


3. Set the Telegraf service to start on boot.

Shell


Step 3: Deploy and Configure Kapacitor

Kapacitor is used to process time series data, detect anomalies, and trigger alerts. As the preferred alerting mechanisms for every customer will vary, refer to the Kapacitor documentation on how to send alerts through specific tools.

1. Install Kapacitor using the instructions from InfluxDB. https://docs.influxdata.com/kapacitor/v1.6/introduction/installation/

2. Create the Kapacitor configuration file (typically located at /etc/kapacitor/kapacitor.conf). Open the file in an editor.

  • Copy and paste the following contents into the file.
  • Replace the designated placeholders (denoted with an $) with the specific values for the given environment:
    • KAPACITOR_HOST_OR_IP = Hostname or IP address associated with the Kapacitor instance
    • INFLUX_URL = URL of the InfluxDB instance (e.g., http://10.6.0.4:8086)
Text


Add Kapacitor Alerts and Templates

Ocient provides sample Kapacitor alerts and templates upon request. Contact Ocient Support for details.

Within the bundle, relevant contents are located within the kapacitor directory.

  1. Copy the files located under kapacitor/load/tasks to the Kapacitor system in /etc/kapacitor/load/tasks.
  2. Copy the files located under kapacitor/load/templates to the Kapacitor system in /etc/kapacitor/load/templates.

Start Kapacitor

When the configuration is complete, Kapacitor must be started.

1. On each node, start the Kapacitor service.

Shell


2. Ensure the service came up properly by checking the status.

Shell


3. Set the Kapacitor service to start on boot.

Shell


Step 4: Deploy and Configure Grafana

Grafana provides the dashboard and visualization capability for metrics. This tool is used as opposed to

, which is traditionally used to complete the TICK (Telegraf/InfluxDB/Chronograf/Kapacitor) stack. Grafana is favored over Chronograf due to its versatility beyond the InfluxDB ecosystem, as well as higher frequency of contributions and updates from the broader community.

Install Grafana. Follow the Grafana instructions to ensure that configuration options for items such as users, permissions, and logging all correspond to the environment. https://grafana.com/docs/grafana/latest/installation/

Add InfluxDB as a Data Source

1. Open Grafana in a web browser.

2. In the Grafana UI, click the Settings tab on the left side of the screen. Select Configuration->Data Sources.

Configuration of data sources


3. Click Add data source.

4. Under the data source list, select InfluxDB. Click the Select button.

Add data source


5. Enter all of the pertinent configuration details associated with the InfluxDB instance:

  • Ensure the data source is marked as the Default.
  • Under the HTTP section, the URL should be the same as the INFLUX_URL used in previous sections.
  • The query language should be set to the default (InfluxQL).
Configuration settings


6. Under the InfluxDB Details section, specify telegraf as the database name.

InfluxDB details for database access


7. When the configuration appears to be correct, click Save and Test.

8. A green banner should appear that indicates the connection to the data source is working properly. If this message does not display, consult the Grafana documentation or contact Ocient Support.

Connection status


Import Dashboards

Ocient provides sample Grafana dashboards upon request. Contact Ocient Support for details.

Within the bundle, relevant contents are located within the grafana directory.

Grafana only allows one file at a time.

  1. Open Grafana in a web browser.
  2. In the Grafana UI, click the + ("Create") tab on the left side of the screen. Select the Import option.
  3. On the Import page, click the Upload JSON File button.
  4. Select the file representing the dashboard you wish to import.
  5. On the dashboard summary page, verify the settings are correct. Click Import to complete the process.
  6. Repeat steps 3-6 for each dashboard that needs to be added.

Related Links



Filebeat is a trademark of Elasticsearch BV.