Replace an NVMe Node Drive
software uses NVMe drives for storage because of their higher I/O throughput than spinning media drives. These drives are referred to as data disks/drives.
In Ocient, these drives are accessed through a User Mode I/O (UIO) driver, so they cannot be accessed via the common set of Linux commands (e.g., df/du/ls/file/parted/etc).
Working on these drives requires the rolehostd process to be stopped on the node, because the process has open file handles on the drives and making changes to the drive can make the process unstable.
Execution of this process requires sudo privilege on the Foundation node on which the drive is being replaced.
There are two approaches for detecting issues with NVMe drives on a Foundation Node. The device status of all drives on a given Foundation Node can be checked via SQL Query of the system catalog to ensure all are "active." Secondly, the presence of NVMe drives on the PCI bus can be inspected. You can run each command on a Foundation Node to assess the health of the drives on that node. Depending on the failure mode of a device, either of these methods could reveal an issue with an NVMe drive.
SQL queries on the sys.storage_device_status system catalog table will provide a status of all payload drives. Connect to any SQL Node on the system and run the following command, replacing <NODE_NAME> with the name of your node:
Sample output:
The possible storage device statuses are ACTIVE or FAILED. In this sample output, the device in PCI Address 0000:89:00.0 needs to be replaced.
If an NVMe drive is completely unavailable, it might not appear in the sys.storage_device_status table. For example, if a drive is not seated properly or if the node has rebooted and the drive is undetectable by Ocient. In this case, see the Alternative Methods to Identify Failed Drives in the section below to identify the serial number and PCI address of the drive.
In the event that a drive has failed or is experiencing issues, errors are logged in /var/log/messages and /var/opt/ocient/log/rolehostd.log* files. Inspection of these log files can assist with root cause analysis of the drive failure.
- To locate the failed drive in the chassis, use the query results from the Check Drive Status via System Catalog Tables query to identify the PCI Address of the failed drive.
- Refer to the chassis diagram for your system to identify the slot of the failed drive based on the PCI address. Three common chassis and drive bay mappings are listed below:
Supermicro SYS-1029U-TN10RT Hot swap bays
NVMe 3c:00.0 | NVMe 5f:00.0 | NVMe 61:00.0 | NVMe 87:00.0 | NVMe 89:00.0 |
---|---|---|---|---|
NVMe 3b:00.0 | NVMe 5e:00.0 | NVMe 60:00.0 | NVMe 86:00.0 | NVMe 88:00.0 |
- An NVMe drive of the same model and the same or larger capacity as the failed drive.
- The drive firmware is upgraded to the latest version using the NVMe Drive Firmware Upgrade Process.
1. Login to the Foundation Node that needs a drive replacement as an administrator user and stop the rolehostd process on this node to ensure that the drive replacement does not impact stability of the process:
2. Ensure that the rolehostd process has stopped. The result of this command should indicate that the process is Stopped:
3. If using encrypted OPAL drives, backup the localKeyStore directory. This directory can be empty or absent on systems that do not have OPAL drives.
4. Physically remove the failed drive from the host and execute nvme-driver-util.sh to ensure that the failed drive is not showing up in the output:
Example output:
5. Insert the replacement drive in the chassis and ensure that it shows up in output of the nvme-driver-util.sh script.
Example output:
6. Run this command by replacing X and Y to match the device name using the replacement drive.
This command formats the specified drive and erases all data on the drive. If the OS drive is an NVMe drive, ensure that it is not formatted.
7. Execute the nvme-driver-util.sh script to bind drives to the UIO driver:
8. Execute nvme-driver-util.sh script and confirm that the new drive is bound to uio_pci_generic driver.
Example output:
9. Start the rolehostd process:
10. Check the output of the sys.storage_device_status catalog table by connecting to a SQL Node and running by following query after replacing the <NODE_NAME> with your node’s name. Ensure that all drives appear on the node with status of ACTIVE:
As shown in the sample output, the device replaced at PCI Address 0000:89:00.0 is now active. This indicates that it has been successfully replaced.
If the failed drive had data on it prior to failure, a rebuild of data segments stored on the replaced drive is required after the drive is replaced. Refer to the Guide to Rebuilding Segments for instructions on how to rebuild the missing segments.
If the results of querying the sys.storage_device_status table do not indicate which drive has failed, alternative methods can be used to detect the failing drive. These methods require the command-line utility jq, but you can view the full results of the curl commands without it.
You can cross-reference the result of two API commands to determine the PCI address of the failed drive that is not appearing. A drive that does not appear in the stats API or catalog table can be present in the sysconfig API output.
The output of the following API will show NVMe storage device statuses. Any device status other than 10 indicates a potential issue. Replace <IP_ADDRESS_OF_THE_NODE> with the IP Address of your node and inspect the results.
Sample output:
The possible device status values are listed in the table below:
Value | Device Status |
---|---|
0 | INVALID_UNKNOWN |
10 | ACTIVE |
20 | UNINITIALIZED |
61 | CORRUPT |
80 | FAILED |
The output of the following API will show the configured NVMe drives on the Foundation Node. Depending on the failure mode, the output of this command can show a reduced number of drives. For example, if the Foundation Node chassis has 12 drive bays populated and only 11 appear in this result, it would indicate that one drive is not visible to Ocient. Run the following command by replacing <IP_ADDRESS_OF_THE_NODE> with the IP Address of your node:
Sample output:
If the address of the failed drive cannot be determined in this way, proceed to Option 2 to identify the location of the drive.
If drives are not appearing in the Ocient outputs, they might be unavailable to the Ocient service. In this case, directly examine the drives on the system with standard disk utilities and operating system commands to find the serial number of drives that meet the following criteria:
- The drive is not the operating system boot drive
- The drive is not listed in the sys.storage_device_status table
Any drives that meet this criteria are not in use by Ocient and might need to be replaced or repaired. Identify the Serial Number and PCI Addresses of these drives in order to complete the replacement procedure.