System Administration
Maintenance Overview
Expand and Rebalance System
if your system reaches capacity, you can add storage by installing additional nodes and drives this process requires rebalancing the data to distribute the query processing across all available hardware, including newly installed disks or nodes this tutorial covers expanding a system with additional storage for information on how to replace nodes that have failed or are damaged, see replace nodes docid\ b1rmucwdl8wliexfkeket for information on replacing drives that have failed or are damaged, see replace an nvme node drive docid 4xycwe vwmclustuoakpc add foundation nodes a core component of an {{ocient}} system, the foundation nodes store user data in ocient and perform the bulk of query processing by adding more foundation nodes, a system can support extra storage capacity as well as better performance for query processing system considerations no individual cluster should have more than two petabytes of storage each cluster should have the same number of foundation nodes prerequisites the process requires either the system administrator role or the granted update privilege on system any new foundation nodes should have the same os kernel version as other nodes in the system any new foundation nodes should have the same ocient system version as the other nodes for details, see ocient application installation docid\ faulzmg iytejhucowpbh ensure that your system meets the requirements outlined in ocient system bootstrapping docid\ ybu9emip2a7ts0jhb8r9b stop any query or lat process running on your system tutorial create the /var/opt/ocient/bootstrap conf file on the new node using this yaml example this yaml file should follow a similar format and parameters as the bootstrap conf files on your other nodes r eplace the \<first node address> with the ip or hostname of a node running the admin istrator role on your system if you are not using dns, use nodeaddress instead of adminhost use a user account with system administrator privileges for your system for the adminusername and adminpassword adminhost \<first node address> adminusername my admin adminpassword example password start the new node by using this command sudo systemctl start rolehostd this process takes about a minute to complete this step accepts the node into the system, but it has not been assigned a role or a storage cluster to validate that the node is on the system, execute this query using the sys nodes system catalog table select name, status from sys nodes order by 1; output name status \ admin01 accepted admin02 accepted admin03 accepted loader01 accepted loader02 accepted loader03 accepted foundation01 accepted foundation02 accepted foundation03 accepted foundation04 accepted foundation05 accepted foundation06 accepted foundation07 accepted foundation08 accepted foundation09 accepted foundation10 accepted foundation11 accepted foundation12 accepted foundation new accepted sql01 accepted sql02 accepted the output shows the new node, named foundation new in this example, listed as accepted in the status column execute this alter cluster sql statement to add the new node to a storage cluster in this example, replace storage cluster 1 and foundation new with your cluster name and node name, respectively alter cluster "storage cluster 1" add participants "foundation new"; this example adds only one new node for information on adding multiple nodes, see cluster and node management docid\ csequa9yqcqaaaexspyue restart the rolehostd process on all nodes in the system by running the following series of commands at the shell terminal first, stop the rolehostd process on all nodes sudo systemctl kill s sigkill rolehostd confirm that the rolehostd process is no longer running sudo systemctl status rolehostd start the rolehostd process on the system again sudo systemctl start rolehostd you can verify the new node is active by executing this sql query after you connect to the database select n name, ns operational status from sys node status ns join sys nodes n on ns node id = n id order by n name; name operational status \ admin01 active admin02 active admin03 active loader01 active loader02 active loader03 active foundation01 active foundation02 active foundation03 active foundation04 active foundation05 active foundation06 active foundation07 active foundation08 active foundation09 active foundation10 active foundation11 active foundation12 active foundation new active sql01 active sql02 active perform a rebalance task to distribute your data across your newly expanded system evenly for details, see expand and rebalance system docid\ o tbcfso5drrzbeobr p8 add more drives to foundation nodes foundation nodes can support extra storage by adding additional nvme drives this tutorial shows the steps to integrate new drives into an existing ocient system system considerations for best performance, all foundation nodes should have equal storage capacity no individual cluster should have more than two petabytes of storage prerequisites the process requires systemctl access on your system os any new nvme drives added to the system must be blank and unpartitioned tutorial shut down the rolehostd process from the shell prompt sudo systemctl stop rolehostd install the new storage drives in your system r estart the node from the shell prompt sudo systemctl restart rolehostd the rolehostd process recognizes the new drive to confirm the new drives are active and running, you can connect to your system and query the sys storage device status system catalog table to use this example query, replace \<node name> with the name of the node where you are adding a drive select n name as node name, s node id, s id as serial number, s pci address, s device status, s device model from sys nodes n join sys storage device status s on n id = s node id where n name = '\<node name>'; output |node name |node id |serial number |pci address |device status|device model | \| | | | | | | |foundation0|9330a0b3 b3b7 4503 949c 043b196c0cc4|6156962f fcf6 4299 bbd7 2618fc6f1d00|/var/opt/ocient/6156962f fcf6 4299 bbd7 2618fc6f1d00 dat|active |pcie data center ssd intel ssdpe2me800g4| |foundation0|9330a0b3 b3b7 4503 949c 043b196c0cc4|e6a4b24f 0d49 4704 b7ce 18af163c0701|/var/opt/ocient/e6a4b24f 0d49 4704 b7ce 18af163c0701 dat|active |pcie data center ssd intel ssdpe2me800g4| this output lists the serial numbers of all drives in the foundation0 node, including their statuses perform a rebalance task to distribute your data across your newly expanded system evenly for details, see expand and rebalance system docid\ o tbcfso5drrzbeobr p8 rebalance system rebalance task execution redistributes your data evenly across your segment groups and clusters rebalance your system if you have recently installed new hardware, particularly new nodes or drives prerequisites only one rebalance task can execute at a time on the system the system logs an error if you try to start a second rebalance task you must have the system administrator role, or be granted the update privilege on system tutorial execute a select sql statement to view how the system has distributed the existing data using the sys tables and sys nodes system catalog tables select t name as table name, n name as node name, su used bytes from sys storage used su join sys tables t on t id = su table id join sys nodes n on n id = su node id order by t name, n name; output + + + + \| table name | node name | used bytes | \| + + | \| table0 | lts0 | 48000000 | \| table0 | lts1 | 11000000 | \| table0 | lts2 | 79000000 | \| table0 | lts3 | 73000000 | \| table0 | lts4 | 85000000 | \| table0 | lts5 | 50000000 | \| table0 | lts6 | 51000000 | \| table0 | lts7 | 41000000 | \| table1 | lts0 | 66000000 | \| table1 | lts1 | 77000000 | \| table1 | lts2 | 73000000 | \| table1 | lts3 | 43000000 | \| table1 | lts4 | 76000000 | \| table1 | lts5 | 51000000 | \| table1 | lts6 | 71000000 | \| table1 | lts7 | 105000000 | the query output shows how data is distributed across your nodes execute the rebalance task named rebalance task to reorganize all data in a balanced state using the create task sql statement create task rebalance task type rebalance; for details about the syntax for creating tasks, see distributed tasks docid 9rqvzquratdsyzj8d3fez view the status of the tasks as the rebalance operation executes using the sys subtasks system catalog table select id, name, start time, task type, execution type, status, details from sys subtasks where task type = 'rebalance' order by start time desc; output + + + + + + + + \| id | name | start time | task type | execution type | status | details | \| + + + + + + | \| ed68a070 d7f7 4c50 9b94 84b04b4503c2 | | 2024 04 17 21 44 20 268000 | rebalance | intra cluster rebalance | complete | | \| 9e6c3374 6e9c 4a84 9bf8 ba1db3d866eb | | 2024 04 17 21 44 20 268000 | rebalance | intra cluster rebalance | complete | | \| aebb75d8 e066 4813 b7e8 c272e5a3b8e9 | | 2024 04 17 21 44 14 768000 | rebalance | inter cluster rebalance | complete | | \| 5cec5095 8b3a 4868 a10d 35fe6a32e998 | | 2024 04 17 21 44 14 564000 | rebalance | null | complete | task finalizing due to terminal status complete | + + + + + + + + the output shows when the rebalance task is finished when the rebalance task runs, segments are in the rebuilding state after the ocient system completes this task, all segments should transition to the intact state execute the query from step 1 again to see how the system reorganized the data across nodes select t name, n name, su used bytes from sys storage used su join sys tables t on t id = su table id join sys nodes n on n id = su node id order by t name, n name; output + + + + \| name | name 1 | used bytes | \| + + | \| table0 | lts0 | 48000000 | \| table0 | lts1 | 39000000 | \| table0 | lts2 | 71000000 | \| table0 | lts3 | 53000000 | \| table0 | lts4 | 85000000 | \| table0 | lts5 | 50000000 | \| table0 | lts6 | 51000000 | \| table0 | lts7 | 41000000 | \| table1 | lts0 | 66000000 | \| table1 | lts1 | 77000000 | \| table1 | lts2 | 73000000 | \| table1 | lts3 | 43000000 | \| table1 | lts4 | 76000000 | \| table1 | lts5 | 68000000 | \| table1 | lts6 | 71000000 | \| table1 | lts7 | 88000000 | the output shows the redistributed data the rebalance task redistributes data across segment groups, so there can be some variance in data across nodes related links replace nodes docid\ b1rmucwdl8wliexfkeket replace an nvme node drive docid 4xycwe vwmclustuoakpc