System Administration
Maintenance Overview
Expand and Rebalance System
i f your system needs more resources, you can expand it by installing additional nodes and drives the steps for expanding a system depend on the type of node you are adding these sections explain the steps for adding each {{ocient}} node type foundation nodes include additional steps for adding more storage drives and rebalancing the system for information on how to replace nodes that have failed or are damaged, see docid\ xf o62ftkx r0jiotjozs for information on replacing drives that have failed or are damaged, see docid\ movujbt1f 8kwrrdd qg9 prerequisites these prerequisites apply to adding any node type the process requires either the system administrator role or the granted update privilege on system any new nodes should have the same os kernel version as other nodes in the system any new nodes should have the same ocient system version as the other nodes for details, see docid\ l4to0wifbytuosh5nscob ensure that your system meets the requirements outlined in docid 4005nflvguw4fqfqa1spu before adding nodes, stop any query or loading process running on your system add foundation nodes a core component of an {{ocient}} system, the foundation nodes store user data in ocient and perform the bulk of query processing by adding more foundation nodes, a system can support extra storage capacity as well as better performance for query processing this process requires rebalancing the data to distribute the query processing across all available hardware, including newly installed disks or nodes system considerations no individual cluster should have more than two petabytes of storage each cluster should have the same number of foundation nodes tutorial create the /var/opt/ocient/bootstrap conf file on the new node using this yaml example this yaml file should follow a similar format and parameters as the bootstrap conf files on your other nodes r eplace the \<first node address> with the ip or hostname of a node running the admin istrator role on your system if you are not using dns, use nodeaddress instead of adminhost use a user account with system administrator privileges for your system for the adminusername and adminpassword adminhost \<first node address> adminusername my admin adminpassword example password start the new node by using this command sudo systemctl start rolehostd this process takes about a minute to complete this step accepts the node into the system, but it has not been assigned a role or a storage cluster to validate that the node is on the system, execute this query using the sys nodes system catalog table select name, status from sys nodes order by 1; output name status \ admin01 accepted admin02 accepted admin03 accepted loader01 accepted loader02 accepted loader03 accepted foundation01 accepted foundation02 accepted foundation03 accepted foundation04 accepted foundation05 accepted foundation06 accepted foundation07 accepted foundation08 accepted foundation09 accepted foundation10 accepted foundation11 accepted foundation12 accepted foundation new accepted sql01 accepted sql02 accepted the output shows the new node, named foundation new in this example, listed as accepted in the status column execute this alter cluster sql statement to add the new node to a storage cluster in this example, replace storage cluster 1 and foundation new with your cluster name and node name, respectively alter cluster "storage cluster 1" add participants "foundation new"; this example adds only one new node for information on adding multiple nodes, see docid\ xga0pas8wadtq33 a x7v restart the rolehostd process on all nodes in the system by running the following series of commands at the shell terminal first, stop the rolehostd process on all nodes sudo systemctl kill s sigkill rolehostd confirm that the rolehostd process is no longer running sudo systemctl status rolehostd start the rolehostd process on the system again sudo systemctl start rolehostd you can verify the new node is active by executing this sql query after you connect to the database select n name, ns operational status from sys node status ns join sys nodes n on ns node id = n id order by n name; name operational status \ admin01 active admin02 active admin03 active loader01 active loader02 active loader03 active foundation01 active foundation02 active foundation03 active foundation04 active foundation05 active foundation06 active foundation07 active foundation08 active foundation09 active foundation10 active foundation11 active foundation12 active foundation new active sql01 active sql02 active perform a rebalance task to distribute your data across your newly expanded system evenly for details, see /#rebalance system add more drives to foundation nodes foundation nodes can support extra storage by adding additional nvme drives this tutorial shows the steps to integrate new drives into an existing ocient system system considerations for best performance, all foundation nodes should have equal storage capacity no individual cluster should have more than two petabytes of storage prerequisites the process requires systemctl access on your system os any new nvme drives added to the system must be blank and unpartitioned tutorial shut down the rolehostd process from the shell prompt sudo systemctl stop rolehostd install the new storage drives in your system r estart the node from the shell prompt sudo systemctl restart rolehostd the rolehostd process recognizes the new drive to confirm the new drives are active and running, you can connect to your system and query the https //docs ocient com/system catalog#pnrpa system catalog table to use this example query, replace \<node name> with the name of the node where you are adding a drive select n name as node name, s node id, s id as serial number, s pci address, s device status, s device model from sys nodes n join sys storage device status s on n id = s node id where n name = '\<node name>'; output |node name |node id |serial number |pci address |device status|device model | \| | | | | | | |foundation0|9330a0b3 b3b7 4503 949c 043b196c0cc4|6156962f fcf6 4299 bbd7 2618fc6f1d00|/var/opt/ocient/6156962f fcf6 4299 bbd7 2618fc6f1d00 dat|active |pcie data center ssd intel ssdpe2me800g4| |foundation0|9330a0b3 b3b7 4503 949c 043b196c0cc4|e6a4b24f 0d49 4704 b7ce 18af163c0701|/var/opt/ocient/e6a4b24f 0d49 4704 b7ce 18af163c0701 dat|active |pcie data center ssd intel ssdpe2me800g4| this output lists the serial numbers of all drives in the foundation0 node, including their statuses perform a rebalance task to distribute your data across your newly expanded system evenly for details, see /#rebalance system rebalance system rebalance task execution redistributes your data evenly across your segment groups and clusters rebalance your system if you have recently installed new hardware, particularly new nodes or drives prerequisites only one rebalance task can execute at a time on the system the system logs an error if you try to start a second rebalance task you must have the system administrator role, or be granted the update privilege on system tutorial execute a select sql statement to view how the system has distributed the existing data using the sys tables and sys nodes system catalog tables select t name as table name, n name as node name, su used bytes from sys storage used su join sys tables t on t id = su table id join sys nodes n on n id = su node id order by t name, n name; output + + + + \| table name | node name | used bytes | \| + + | \| table0 | lts0 | 48000000 | \| table0 | lts1 | 11000000 | \| table0 | lts2 | 79000000 | \| table0 | lts3 | 73000000 | \| table0 | lts4 | 85000000 | \| table0 | lts5 | 50000000 | \| table0 | lts6 | 51000000 | \| table0 | lts7 | 41000000 | \| table1 | lts0 | 66000000 | \| table1 | lts1 | 77000000 | \| table1 | lts2 | 73000000 | \| table1 | lts3 | 43000000 | \| table1 | lts4 | 76000000 | \| table1 | lts5 | 51000000 | \| table1 | lts6 | 71000000 | \| table1 | lts7 | 105000000 | the query output shows how data is distributed across your nodes execute the rebalance task named rebalance task to reorganize all data in a balanced state using the create task sql statement create task rebalance task type rebalance; for details about the syntax for creating tasks, see docid\ wyzmz0s4turygt1pbasjs view the status of the tasks as the rebalance operation executes using the sys subtasks system catalog table select id, name, start time, task type, execution type, status, details from sys subtasks where task type = 'rebalance' order by start time desc; output + + + + + + + + \| id | name | start time | task type | execution type | status | details | \| + + + + + + | \| ed68a070 d7f7 4c50 9b94 84b04b4503c2 | | 2024 04 17 21 44 20 268000 | rebalance | intra cluster rebalance | complete | | \| 9e6c3374 6e9c 4a84 9bf8 ba1db3d866eb | | 2024 04 17 21 44 20 268000 | rebalance | intra cluster rebalance | complete | | \| aebb75d8 e066 4813 b7e8 c272e5a3b8e9 | | 2024 04 17 21 44 14 768000 | rebalance | inter cluster rebalance | complete | | \| 5cec5095 8b3a 4868 a10d 35fe6a32e998 | | 2024 04 17 21 44 14 564000 | rebalance | null | complete | task finalizing due to terminal status complete | + + + + + + + + the output shows when the rebalance task is finished when the rebalance task runs, segments are in the rebuilding state after the ocient system completes this task, all segments should transition to the intact state execute the query from step 1 again to see how the system reorganized the data across nodes select t name, n name, su used bytes from sys storage used su join sys tables t on t id = su table id join sys nodes n on n id = su node id order by t name, n name; output + + + + \| name | name 1 | used bytes | \| + + | \| table0 | lts0 | 48000000 | \| table0 | lts1 | 39000000 | \| table0 | lts2 | 71000000 | \| table0 | lts3 | 53000000 | \| table0 | lts4 | 85000000 | \| table0 | lts5 | 50000000 | \| table0 | lts6 | 51000000 | \| table0 | lts7 | 41000000 | \| table1 | lts0 | 66000000 | \| table1 | lts1 | 77000000 | \| table1 | lts2 | 73000000 | \| table1 | lts3 | 43000000 | \| table1 | lts4 | 76000000 | \| table1 | lts5 | 68000000 | \| table1 | lts6 | 71000000 | \| table1 | lts7 | 88000000 | the output shows the redistributed data the rebalance task redistributes data across segment groups, so there can be some variance in data across nodes add loader nodes adding more loader nodes to your system can improve the throughput of data loading and resiliency against loading failure by having extra loader nodes, you can also dedicate sets of nodes for specific pipelines system considerations before starting this process, ensure you meet the requirements in the /#prerequisites section tutorial stop any active data pipelines execute this sql statement by replacing pipeline name with the name of your data pipeline stop pipeline \<pipeline name>; create the /var/opt/ocient/bootstrap conf file on the new loader node using this yaml example this yaml file should follow a similar format and parameters to the bootstrap conf file on your other nodes r eplace the \<first node address> with the ip address or hostname of a node running the admin istrator role on your system if you are not using a dns, use nodeaddress instead of adminhost specify a user account with system administrator privileges for your system for the username adminusername and password adminpassword adminhost \<first node address> adminusername my admin adminpassword example password start the new node by using this command sudo systemctl start rolehostd this process takes about a minute to complete this step accepts the node into the system, but does not assign a role add the streamloader role to the new node execute this sql statement by replacing \<new node name> with the new node name alter node \<new node name> add role streamloader; restart the rolehostd process on the replacement node by running this command at the shell terminal on the replacement node sudo systemctl restart rolehostd restart the pipeline execute this sql statement by replacing pipeline name with the name of your data pipeline optionally, specify the loader node by name with the using loaders keywords to prioritize it for this pipeline start pipeline \<pipeline name> using loaders \<new node name>; if you use the legacy lat service, you must stop loading and copy the lat configuration files to any new loader nodes for more information, see docid\ xf o62ftkx r0jiotjozs this step is unnecessary for systems that use ocient data pipelines for loading add sql nodes adding more sql nodes to your system can improve query optimization and processing, particularly for aggregation and join operations extra sql nodes also provide system resiliency, especially when assigned the admin role system considerations before starting this process, ensure you meet the requirements in the /#prerequisites section tutorial create the /var/opt/ocient/bootstrap conf file on the new sql node using this yaml example this yaml file should follow a similar format and parameters to the bootstrap conf file on your other nodes r eplace the \<first node address> with the ip address or hostname of a node running the admin istrator role on your system if you are not using a dns, use nodeaddress instead of adminhost specify a user account with system administrator privileges for your system for the username adminusername and password adminpassword adminhost \<first node address> adminusername my admin adminpassword example password start the new node by using this command sudo systemctl start rolehostd this process takes about a minute to complete this step accepts the node into the system, but does not assign a role add the sql role to the new node execute this sql statement by replacing \<new node name> with the new node name alter node \<new node name> add role sql; optionally, you can also assign the admin role to the sql node at least one sql node must always fulfill this role by default, the system assigns the admin role to the first sql node in the system for more information about the admin role, see docid twgeqk yljyfa h iaj execute this sql statement by replacing \<new node name> with the name of your new node alter node \<new node name> add role admin; restart the rolehostd process on the replacement node by running this command at the shell terminal on the replacement node sudo systemctl restart rolehostd assign the new sql node to a connectivity pool with the docid\ xga0pas8wadtq33 a x7v statement in this example, the statement assigns the sql node sql2 to the connectivity pool cp1 with the ip address 111 1 1 1 and port number 4050 for listening specify the local ip address and port number 4050 to return to the client alter connectivity pool cp1 add participants( node sql2 listen address '111 1 1 1' listen port 4050 advertised address 'localhost' advertised port 4050); related links docid\ xf o62ftkx r0jiotjozs docid\ movujbt1f 8kwrrdd qg9