System Administration
...
Maintenance Overview
NVMe Drive Maintenance

Replace an NVMe Node Drive

 software uses NVMe drives for storage because of their higher I/O throughput than spinning media drives. These drives are referred to as data disks/drives.

In Ocient, these drives are accessed through a User Mode I/O (UIO) driver, so they cannot be accessed via the common set of Linux commands (e.g., df/du/ls/file/parted/etc).

Working on these drives requires the rolehostd process to be stopped on the node, because the process has open file handles on the drives and making changes to the drive can make the process unstable.

Execution of this process requires sudo privilege on the Foundation node on which the drive is being replaced.

Detection and Alerting

There are two approaches for detecting issues with NVMe drives on a Foundation Node. The device status of all drives on a given Foundation Node can be checked via SQL Query of the system catalog to ensure all are "active." Secondly, the presence of NVMe drives on the PCI bus can be inspected. You can run each command on a Foundation Node to assess the health of the drives on that node. Depending on the failure mode of a device, either of these methods could reveal an issue with an NVMe drive.

Check Drive Status via System Catalog Tables

SQL queries on the sys.storage_device_status system catalog table will provide a status of all payload drives. Connect to any SQL Node on the system and run the following command, replacing <NODE_NAME> with the name of your node:

SQL


Sample output:

Text


The possible storage device statuses are ACTIVE or FAILED. In this sample output, the device in PCI Address 0000:89:00.0 needs to be replaced.

If an NVMe drive is completely unavailable, it might not appear in the sys.storage_device_status table. For example, if a drive is not seated properly or if the node has rebooted and the drive is undetectable by Ocient. In this case, see the Alternative Methods to Identify Failed Drives in the section below to identify the serial number and PCI address of the drive.

In the event that a drive has failed or is experiencing issues, errors are logged in /var/log/messages and /var/opt/ocient/log/rolehostd.log* files. Inspection of these log files can assist with root cause analysis of the drive failure.

Troubleshooting/identification:

  1. To locate the failed drive in the chassis, use the query results from the Check Drive Status via System Catalog Tables query to identify the PCI Address of the failed drive.
  2. Refer to the chassis diagram for your system to identify the slot of the failed drive based on the PCI address. Three common chassis and drive bay mappings are listed below:

Supermicro SYS-1029U-TN10RT Hot swap bays

NVMe 3c:00.0

NVMe 5f:00.0

NVMe 61:00.0

NVMe 87:00.0

NVMe 89:00.0

NVMe 3b:00.0

NVMe 5e:00.0

NVMe 60:00.0

NVMe 86:00.0

NVMe 88:00.0

Recovery

Prerequisites

  • An NVMe drive of the same model and the same or larger capacity as the failed drive.
  • The drive firmware is upgraded to the latest version using the NVMe Drive Firmware Upgrade Process.

Replacement procedure

1. Login to the Foundation Node that needs a drive replacement as an administrator user and stop the rolehostd process on this node to ensure that the drive replacement does not impact stability of the process:

Shell


2. Ensure that the rolehostd process has stopped. The result of this command should indicate that the process is Stopped:

Shell


3. If using encrypted OPAL drives, backup the localKeyStore directory. This directory can be empty or absent on systems that do not have OPAL drives.

Shell


4. Physically remove the failed drive from the host and execute nvme-driver-util.sh to ensure that the failed drive is not showing up in the output:

Shell


Example output:

Text


5. Insert the replacement drive in the chassis and ensure that it shows up in output of the nvme-driver-util.sh script.

Shell


Example output:

Text


6. Run this command by replacing X and Y to match the device name using the replacement drive.

Text


This command formats the specified drive and erases all data on the drive. If the OS drive is an NVMe drive, ensure that it is not formatted.

7. Execute the nvme-driver-util.sh script to bind drives to the UIO driver:

Shell


8. Execute nvme-driver-util.sh script and confirm that the new drive is bound to uio_pci_generic driver.

Shell


Example output:

Text


9. Start the rolehostd process:

Shell


10. Check the output of the sys.storage_device_status catalog table by connecting to a SQL Node and running by following query after replacing the <NODE_NAME> with your node’s name. Ensure that all drives appear on the node with status of ACTIVE:

SQL


As shown in the sample output, the device replaced at PCI Address 0000:89:00.0 is now active. This indicates that it has been successfully replaced.

Text


Next Steps: Rebuild Segments

If the failed drive had data on it prior to failure, a rebuild of data segments stored on the replaced drive is required after the drive is replaced. Refer to the Guide to Rebuilding Segments for instructions on how to rebuild the missing segments.

Alternative Methods to Identify Failed Drives

If the results of querying the sys.storage_device_status table do not indicate which drive has failed, alternative methods can be used to detect the failing drive. These methods require the command-line utility jq, but you can view the full results of the curl commands without it.

Option 1: Cross Reference API results to identify the PCI Address

You can cross-reference the result of two API commands to determine the PCI address of the failed drive that is not appearing. A drive that does not appear in the stats API or catalog table can be present in the sysconfig API output.

The output of the following API will show NVMe storage device statuses. Any device status other than 10 indicates a potential issue. Replace <IP_ADDRESS_OF_THE_NODE> with the IP Address of your node and inspect the results.

Curl


Sample output:

Text


The possible device status values are listed in the table below:

Value

Device Status

0

INVALID_UNKNOWN

10

ACTIVE

20

UNINITIALIZED

61

CORRUPT

80

FAILED

Check Device Presence

The output of the following API will show the configured NVMe drives on the Foundation Node. Depending on the failure mode, the output of this command can show a reduced number of drives. For example, if the Foundation Node chassis has 12 drive bays populated and only 11 appear in this result, it would indicate that one drive is not visible to Ocient. Run the following command by replacing <IP_ADDRESS_OF_THE_NODE> with the IP Address of your node:

Curl


Sample output:

Text


If the address of the failed drive cannot be determined in this way, proceed to Option 2 to identify the location of the drive.

Option 2: Inspect Drives via Operating System Commands

If drives are not appearing in the Ocient outputs, they might be unavailable to the Ocient service. In this case, directly examine the drives on the system with standard disk utilities and operating system commands to find the serial number of drives that meet the following criteria:

  1. The drive is not the operating system boot drive
  2. The drive is not listed in the sys.storage_device_status table

Any drives that meet this criteria are not in use by Ocient and might need to be replaced or repaired. Identify the Serial Number and PCI Addresses of these drives in order to complete the replacement procedure.