sudo privilege on the node being started or shut down. This procedure is not applicable to fully managed Ocient Systems.
Startup Procedure
The startup procedure can be used for starting therolehostd process on a running node as well as powering on nodes in an Ocient System. In most installations, the rolehostd process is configured to automatically start on system boot up.
The rolehostd process is the main executable for an Ocient System. During the startup of the rolehostd process, every non-Admininistrator Node reaches out to one of the Administrator Nodes. These nodes require a “quorum” for the Ocient System to startup and operate. To establish a quorum, a majority of Administrator Nodes must be actively running the rolehostd process. For example, when an Ocient System has three Administrator Nodes, at least two must be active to establish quorum and service requests from other nodes.
Nodes in an Ocient System should started in the following sequence for smoothest operation:
Queries are not completed until Foundation Nodes are online.
- Start the
rolehostdprocess on all dedicated Administrator Nodes and SQL Nodes using thesystemctl start rolehostdcommand. - Start the
rolehostdprocess on all Foundation Nodes using thesystemctl start rolehostdcommand. - Start the
rolehostdprocess on all Loader Nodes using thesystemctl start rolehostdcommand. - After the
rolehostdprocess has finished starting on all Loader Nodes, start the Loading and Transformation service using thesystemctl start latcommand.
Shutdown Procedure
The shutdown procedure can be used for stopping therolehostd process as well as powering off nodes in the Ocient System.
Properly sequencing the shutdown of nodes in an Ocient System ensures a smooth power-down process and reduces the appearance of errors to users in loading and querying as the system goes offline. The rolehostd process must be running on nodes with the Administrator Role for Global Dictionary Compression (GDC) lookup to work during loading and querying of data. If these nodes are shut down first, this will cause GDC lookups to fail, resulting in failures for both loading and querying of data.
It is recommended to stagger powering on of nodes by few seconds to reduce the surge in power distribution circuits.
When you execute systemctl kill -s SIGKILL rolehostd, the Ocient System sends SIGKILL to the rolehostd process. The node stops immediately without waiting for any queries or loading that are active on the node to finish. Some potential errors might be visible to the user. To avoid the errors, use the quiescing process.
The Ocient System should be stopped in the following sequence:
- Stop the Loading and Transformation service on all Loader Nodes using the
systemctl stop latcommand. - Stop the
rolehostdprocess on all Loader Nodes thesystemctl kill -s SIGKILL rolehostdcommand. - Stop the
rolehostdprocess on all Foundation Nodes thesystemctl kill -s SIGKILL rolehostdcommand. Any queries issued after this point will fail. - Stop the
rolehostdprocess on all SQL Nodes using thesystemctl kill -s SIGKILL rolehostdcommand. - Stop the
rolehostdprocess on all Administrator Nodes thesystemctl kill -s SIGKILL rolehostdcommand.
Quiescing Nodes
Quiescing nodes can be used for gracefully stopping therolehostd process without interrupting loading or running queries. This process is most useful for maintenance purposes or for node upgrades. Quiescing refers to the graceful shutdown of a node, whereas quiesced refers to a node that has completed quiescing and is now fully shut down.
After the node shuts down, quiescing nodes stop interacting with the rest of the system. Only quiesce one node at a time. When you quiesce multiple nodes at the same time, the quiesce process can freeze or lose forward progress for queries and loading. At startup, the node automatically rejoins the system and participates normally.
To issue a quiesce, execute systemctl stop rolehostd. This command sends the SIGTERM signal to the rolehostd process. The node begins the quiesce process, waits for all relevant queries or loading to finish, and shuts down. If a long-running query causes the quiesce to freeze, kill the query to finish the quiesce process.
During the quiesce process, a SQL Node continues to accept new connections from clients, such as JDBC or pyocient. However, when these clients execute queries, EXPLAIN statements, or DDL updates, the Ocient System redirects the clients to another online SQL Node, if one exists. A user might connect or run a query with the force option, which overrides the redirect behavior and forces the command to run on the SQL Node in the quiesce process. In this case, the Ocient System does not guarantee the successful completion of the query.
A Foundation Node in the quiesce process waits for any segment rebuilds that are in progress to finish. When you interrupt or fail the rebuild, the quiesce process finishes shortly.
When a Loader Node is in the middle of loading, and you initiate the quiesce process on this node, stop the Loading and Transform (LAT) service that is on the same node.
Quiesce has a default time limit of 30 minutes, controlled by the systemd service. After a node quiesces for 30 minutes, the systemd service sends the SIGKILL signal to rolehostd and forcefully kills the node. To change this timeout, edit the TimeoutStopSec parameter in the rolehostd systemd service file.

