System Administration
...
NVMe Drive Maintenance
Replace an NVMe Node Drive
{{ocient}} software uses nvme drives for storage because of their higher i/o throughput than spinning media drives these drives are referred to as data disks or drives in ocient, these drives are accessed through a user mode i/o (uio) driver, so they cannot be accessed using the common set of {{linux}} commands (e g , df/du/ls/file/parted/etc) working on these drives requires the rolehostd process to be stopped on the node, because the process has open file handles on the drives and making changes to the drive can make the process unstable execution of this process requires sudo privilege on the foundation node on which the drive is being replaced detection and alerting there are two approaches for detecting issues with nvme drives on a foundation node the device status of all drives on a specific foundation node can be checked using a sql query of the system catalog to ensure all are active secondly, the presence of nvme drives on the pci bus can be inspected you can run each command on a foundation node to assess the health of the drives on that node depending on the failure mode of a device, either of these methods could reveal an issue with an nvme drive in the case of an operating system drive failure, contact ocient support check drive status using system catalog tables sql queries on the sys storage device status system catalog table provide a status of all payload drives connect to any sql node on the system and execute this query, replacing \<node name> with the name of your node select n name as node name, s node id, s id as serial number, s pci address, s device status, s device model from sys nodes n join sys storage device status s on n id = s node id where n name = '\<node name>'; output node name |node id |serial number |pci address |device status |device model \ + + + + + lts2 |f4845f46 dd0a 48bd a2ed 614d5029e5df|cvmd6356009m800ggn|0000 3b 00 0|active |pcie data center ssd intel ssdpe2me800g4 lts2 |f4845f46 dd0a 48bd a2ed 614d5029e5df|cvmd636400bz800ggn|0000 3c 00 0|active |pcie data center ssd intel ssdpe2me800g4 lts2 |f4845f46 dd0a 48bd a2ed 614d5029e5df|cvmd636400ja800ggn|0000 5e 00 0|active |pcie data center ssd intel ssdpe2me800g4 lts2 |f4845f46 dd0a 48bd a2ed 614d5029e5df|cvmd6365003c800ggn|0000 5f 00 0|active |pcie data center ssd intel ssdpe2me800g4 lts2 |f4845f46 dd0a 48bd a2ed 614d5029e5df|cvmd6365003p800ggn|0000 86 00 0|active |pcie data center ssd intel ssdpe2me800g4 lts2 |f4845f46 dd0a 48bd a2ed 614d5029e5df|cvmd63650092800ggn|0000 87 00 0|active |pcie data center ssd intel ssdpe2me800g4 lts2 |f4845f46 dd0a 48bd a2ed 614d5029e5df|cvmd6365009h800ggn|0000 88 00 0|active |pcie data center ssd intel ssdpe2me800g4 lts2 |f4845f46 dd0a 48bd a2ed 614d5029e5df|cvmd6371005a800ggn|0000 89 00 0|failed |pcie data center ssd intel ssdpe2me800g4 the possible storage device statuses are active or failed in this sample output, the device in pci address 0000 89 00 0 needs to be replaced if an nvme drive is completely unavailable, it might not appear in the sys storage device status table for example, if a drive is not seated properly or if the node has rebooted and the drive is undetectable by ocient in this case, see the replace an nvme node drive /#alternative methods to identify failed drives to identify the serial number and pci address of the drive in the event that a drive has failed or is experiencing issues, errors are logged in /var/log/messages and /var/opt/ocient/log/rolehostd log files inspection of these log files can assist with root cause analysis of the drive failure troubleshooting to locate the failed drive in the chassis, use the query results from the replace an nvme node drive /#check drive status using system catalog tables query to identify the pci address of the failed drive refer to the chassis diagram for your system to identify the slot of the failed drive based on the pci address three common chassis and drive bay mappings are listed below supermicro sys 1029u tn10rt hot swap bays nvme 3c 00 0 nvme 5f 00 0 nvme 61 00 0 nvme 87 00 0 nvme 89 00 0 nvme 3b 00 0 nvme 5e 00 0 nvme 60 00 0 nvme 86 00 0 nvme 88 00 0 recovery prerequisites an nvme drive of the same model and the same or larger capacity as the failed drive the drive firmware is upgraded to the latest version using the nvme drive firmware upgrade process docid\ sjlhmnwjgx1ygn8kpotmo replacement procedure 1\ log in to the foundation node that needs a drive replacement as an administrator user and stop the rolehostd process on this node to ensure that the drive replacement does not impact stability of the process sudo systemctl stop rolehostd 2\ ensure that the rolehostd process has stopped the result of this command should indicate that the process is stopped sudo systemctl status rolehostd 3\ if using encrypted opal drives, backup the localkeystore directory this directory can be empty or absent on systems that do not have opal drives sudo cp pr /var/opt/ocient/localkeystore /root 4\ physically remove the failed drive from the host and execute nvme driver util sh to ensure that the failed drive is not showing up in the output /opt/ocient/scripts/nvme driver util sh output \[admin\@go lts9 ]$ /opt/ocient/scripts/nvme driver util sh nvme device status use option 'bind uio' to bind uio pci generic driver to ocient payload/unpartitioned drives use option 'bind nvme' to bind nvme driver to drives bdf numa node driver name device name 0000 3b 00 0 0 uio pci generic 0000 3c 00 0 0 uio pci generic 0000 5e 00 0 0 uio pci generic 0000 5f 00 0 0 uio pci generic 0000 86 00 0 1 uio pci generic 0000 87 00 0 1 uio pci generic 0000 88 00 0 1 uio pci generic \[admin\@go lts9 ]$ 5\ insert the replacement drive in the chassis and ensure that it shows up in output of the nvme driver util sh script /opt/ocient/scripts/nvme driver util sh o utput \[admin\@go lts9 ]$ /opt/ocient/scripts/nvme driver util sh nvme device status use option 'bind uio' to bind uio pci generic driver to ocient payload/unpartitioned drives use option 'bind nvme' to bind nvme driver to drives bdf numa node driver name device name 0000 3b 00 0 0 uio pci generic 0000 3c 00 0 0 uio pci generic 0000 5e 00 0 0 uio pci generic 0000 5f 00 0 0 uio pci generic 0000 86 00 0 1 uio pci generic 0000 87 00 0 1 uio pci generic 0000 88 00 0 1 uio pci generic 0000 89 00 0 1 nvme \[admin\@go lts9 ]$ 6\ run this command by replacing x and y to match the device name using the replacement drive sudo nvme format /dev/nvmexny format ses=1 this command formats the specified drive and erases all data on the drive if the os drive is an nvme drive, ensure that it is not formatted 7\ execute the nvme driver util sh script to bind drives to the uio driver sudo /opt/ocient/scripts/nvme driver util sh bind uio 8\ execute nvme driver util sh script and confirm that the new drive is bound to uio pci generic driver /opt/ocient/scripts/nvme driver util sh o utput \[admin\@go lts9 ]$ /opt/ocient/scripts/nvme driver util sh nvme device status use option 'bind uio' to bind uio pci generic driver to ocient payload/unpartitioned drives use option 'bind nvme' to bind nvme driver to drives bdf numa node driver name device name 0000 3b 00 0 0 uio pci generic 0000 3c 00 0 0 uio pci generic 0000 5e 00 0 0 uio pci generic 0000 5f 00 0 0 uio pci generic 0000 86 00 0 1 uio pci generic 0000 87 00 0 1 uio pci generic 0000 88 00 0 1 uio pci generic 0000 89 00 0 1 uio pci generic \[admin\@go lts9 ]$ 9\ start the rolehostd process sudo systemctl start rolehostd 10\ check the output of the sys storage device status catalog table by connecting to a sql node and running by following query after replacing the \<node name> with the name of your node ensure that all drives appear on the node with status of active select n name as node name, s node id, s id as serial number, s pci address, s device status, s model from sys nodes n join sys storage device status s on n id = s node id where n name = '\<node name>'; o utput node name |node id |serial number |pci address |device status |model \ + + + + + lts2 |f4845f46 dd0a 48bd a2ed 614d5029e5df|cvmd6356009m800ggn|0000 3b 00 0|active |pcie data center ssd intel ssdpe2me800g4 lts2 |f4845f46 dd0a 48bd a2ed 614d5029e5df|cvmd636400bz800ggn|0000 3c 00 0|active |pcie data center ssd intel ssdpe2me800g4 lts2 |f4845f46 dd0a 48bd a2ed 614d5029e5df|cvmd636400ja800ggn|0000 5e 00 0|active |pcie data center ssd intel ssdpe2me800g4 lts2 |f4845f46 dd0a 48bd a2ed 614d5029e5df|cvmd6365003c800ggn|0000 5f 00 0|active |pcie data center ssd intel ssdpe2me800g4 lts2 |f4845f46 dd0a 48bd a2ed 614d5029e5df|cvmd6365003p800ggn|0000 86 00 0|active |pcie data center ssd intel ssdpe2me800g4 lts2 |f4845f46 dd0a 48bd a2ed 614d5029e5df|cvmd63650092800ggn|0000 87 00 0|active |pcie data center ssd intel ssdpe2me800g4 lts2 |f4845f46 dd0a 48bd a2ed 614d5029e5df|cvmd6365009h800ggn|0000 88 00 0|active |pcie data center ssd intel ssdpe2me800g4 lts2 |f4845f46 dd0a 48bd a2ed 614d5029e5df|cvmd637100ga800ggn|0000 89 00 0|active |pcie data center ssd intel ssdpe2me800g4 as shown in the sample output, the device replaced at pci address 0000 89 00 0 is now active this indicates that it has been successfully replaced next steps rebuild segments if the failed drive had data on it prior to failure, a rebuild of data segments stored on the replaced drive is required after the drive is replaced refer to the guide to rebuilding segments docid 12toghwotdgw2 1td9g3m for instructions on how to rebuild the missing segments alternative methods to identify failed drives if the results of querying the sys storage device status table do not indicate which drive has failed, alternative methods can be used to detect the failing drive these methods require the command line utility jq , but you can view the full results of the curl commands without it option 1 cross reference api results to identify the pci address you can cross reference the result of two api commands to determine the pci address of the failed drive that is not appearing a drive that does not appear in the stats api or catalog table can be present in the sysconfig api output the output of the following api shows nvme storage device statuses any device status other than 10 indicates a potential issue replace \<ip address of the node> with the ip address of your node and inspect the results curl s http //\<ip address of the node> 9090/v1/stats | jq r ' \[] |select ( name == "localstorageservice device status" )| "\\( device) \\( value)" ' o utput $ curl s http //1 2 3 4 9090/v1/stats | jq r ' \[] |select ( name == "localstorageservice device status" )| "\\( device) \\( value)" ' btlj913605yr2p0bgn 10 btlj913608ay2p0bgn 10 btlj913608fr2p0bgn 10 btlj91360d6q2p0bgn 10 btlj91360d6y2p0bgn 10 btlj916306ls2p0bgn 10 btlj92020kp22p0bgn 10 btlj92030ajm2p0bgn 10 phlj9403002n2p0bgn 10 phlj941100rw2p0bgn 10 the possible device status values are listed in the table below value device status 0 invalid unknown 10 active 20 uninitialized 61 corrupt 80 failed check device presence the output of the following api will show the configured nvme drives on the foundation node depending on the failure mode, the output of this command can show a reduced number of drives for example, if the foundation node chassis has 12 drive bays populated and only 11 appear in this result, it would indicate that one drive is not visible to ocient run the following command by replacing \<ip address of the node> with the ip address of your node curl s http //\<ip address of the node> 9090/v1/sysconfig | jq r ' storage devices\[] | select ( use != "local file system")| "\\( id) \\( pci address)"' o utput $ curl s http //1 2 3 4 9090/v1/sysconfig | jq r ' storage devices\[] | select ( use != "local file system")| "\\( id) \\( pci address)"' btlj913608ay2p0bgn 0000 3b 00 0 btlj913605yr2p0bgn 0000 3c 00 0 btlj913608fr2p0bgn 0000 5e 00 0 phlj9403002n2p0bgn 0000 5f 00 0 btlj92020kp22p0bgn 0000 60 00 0 btlj92030ajm2p0bgn 0000 61 00 0 phlj941100rw2p0bgn 0000 86 00 0 btlj91360d6y2p0bgn 0000 87 00 0 btlj916306ls2p0bgn 0000 88 00 0 btlj91360d6q2p0bgn 0000 89 00 0 if the address of the failed drive cannot be determined in this way, proceed to option 2 to identify the location of the drive option 2 inspect drives using operating system commands if drives are not appearing in the ocient outputs, they might be unavailable to the ocient service in this case, directly examine the drives on the system with standard disk utilities and operating system commands to find the serial number of drives that meet the following criteria the drive is not the operating system boot drive the drive is not listed in the sys storage device status table any drives that meet this criteria are not in use by ocient and might need to be replaced or repaired identify the serial number and pci addresses of these drives in order to complete the replacement procedure related links system catalog {{linux}} is the registered trademark of linus torvalds in the u s and other countries