System Administration
...
NVMe Drive Maintenance
NVMe Drive Firmware Upgrade Process
the {{ocient}} software uses nvme drives for storage because of their higher i/o throughput than spinning media drives these drives are referred to as data disks or data drives when you run ocient software, the ocient system accesses these data drives using the user mode i/o (uio) driver during normal operation this procedure guides a system administrator through binding the drives back to the nvme driver and upgrading the drive firmware the procedure includes steps for upgrading the firmware of only {{intel}} nvme drives for other drive models, follow a similar process using the firmware upgrade process from the manufacturer stop the rolehostd process prior to working on the nvme drives otherwise, when the rolehostd process is running, the process becomes unstable because it has open file handles on the drives execution of this process requires the sudo privilege on the node where the drive firmware is updated prerequisites perform the firmware upgrade on one node of any specified type (foundation, loader, sql) at a time you can execute the firmware upgrade in parallel on different types of nodes perform the upgrade during a maintenance window because issues with the upgrade might prevent the node from returning to service the prerequisites for this procedure are the following packages are installed on the node undergoing firmware upgrade smartmontools nvme cli intelmas sedutil the target firmware version for the upgrade is known and documented if you upgrade the data drive firmware on a loader node, prior to beginning the procedure, take the node out of the loading path by updating the lat configuration before you shut down the rolehostd process, drain the pages that are held by the node the following example illustrates the firmware update commands used for an intel nvme ssd please substitute the appropriate command for your drive manufacturer in each of the steps that use the intelmas command firmware upgrade procedure the following steps require sudo permissions on the node prepare for the firmware upgrade 1\ log in to the node as an administrator user 2\ if you upgrade the drive firmware on a foundation node or loader node, save the status of the drives for later in the procedure after you replace \<ip address of the node> with the ip address of your node, save the output of this command curl s http //\<ip address of the node> 9090/v1/stats | jq r ' \[] | select ( name == "localstorageservice device status" ) | "( device) ( value)" ' alternatively, you can run a query on the sql node to display the status of the individual storage device and firmware version after you replace \<node name> with the name of your node, connect to any sql node on the system and run this query select s id as serial number, s pci address, s device status, s firmware version from sys nodes n join sys storage device status s on n id = s node id where n name = '\<node name>'; sample output serial number |pci address |device status |firmware version \ + + + cvmd6356009m800ggn|0000 3b 00 0|active |vdv10184 cvmd636400bz800ggn|0000 3c 00 0|active |vdv10184 cvmd636400ja800ggn|0000 5e 00 0|active |vdv10184 cvmd6365003c800ggn|0000 5f 00 0|active |vdv10184 cvmd6365003p800ggn|0000 86 00 0|active |vdv10184 cvmd63650092800ggn|0000 87 00 0|active |vdv10184 phlj0431019j8p0hgn|0000 88 00 0|active |vdv10184 phlj106500018p0hgn|0000 89 00 0|active |vdv10170 3\ if you upgrade the drive firmware on a sql node, save the output of the df command stop the rolehostd process and attach the data drives 1\ stop the rolehostd process to ensure that the drive replacement does not impact system stability sudo systemctl stop rolehostd 2\ prevent the rolehostd process from starting after reboot sudo systemctl disable rolehostd 3\ confirm that the output of this command indicates that the rolehostd process is not running and that the process auto start on reboot setting is disabled sudo systemctl status rolehostd when you upgrade the drive firmware, you can impact data reliability while the rolehostd process is running on the node 4\ attach the data drives to the nvme driver sudo /opt/ocient/scripts/nvme driver util sh bind nvme 5\ ensure that all drives are not the operating system drive and are attached to the nvme driver in the output of this command any drives that did not bind to the nvme driver in the previous step do not upgrade, so you must investigate any failure to bind to the nvme driver /opt/ocient/scripts/nvme driver util sh incorrect eight data drives are attached to the uio driver this is the incorrect system state for performing a firmware upgrade /opt/ocient/scripts/nvme driver util sh nvme device status use option 'bind uio' to bind uio pci generic driver to ocient payload/unpartitioned drives use option 'bind nvme' to bind nvme driver to drives bdf numa node driver name device name 0000 01 00 0 0 nvme nvme0n1 0000 3b 00 0 0 uio pci generic 0000 3c 00 0 0 uio pci generic 0000 5e 00 0 0 uio pci generic 0000 5f 00 0 0 uio pci generic 0000 86 00 0 1 uio pci generic 0000 87 00 0 1 uio pci generic 0000 88 00 0 1 uio pci generic 0000 89 00 0 1 uio pci generic correct the data drives are attached to the nvme driver this is the correct system state for performing a firmware upgrade /opt/ocient/scripts/nvme driver util sh nvme device status use option 'bind uio' to bind uio pci generic driver to ocient payload/unpartitioned drives use option 'bind nvme' to bind nvme driver to drives bdf numa node driver name device name 0000 01 00 0 0 nvme nvme0n1 0000 3b 00 0 0 nvme nvme1n1 0000 3c 00 0 0 nvme nvme2n1 0000 5e 00 0 0 nvme nvme3n1 0000 5f 00 0 0 nvme nvme4n1 0000 86 00 0 1 nvme nvme5n1 0000 87 00 0 1 nvme nvme6n1 0000 88 00 0 1 nvme nvme7n1 0000 89 00 0 1 nvme nvme8n1 perform firmware upgrade 1\ run this command and review the firmwareupdateavailable field to identify the drives that need a firmware upgrade save the results of this step for comparison later in the process example sudo intelmas show d index, serialnumber, devicestatus, capacity, firmwareupdateavailable intelssd drive 8 needs the firmware update, but drive 7 does not need the update output 7 intel ssd dc p4510 series phlj0431019j8p0hgn capacity 8001 56 gb devicestatus healthy firmwareupdateavailable the selected drive contains current firmware as of this tool release index 7 serialnumber phlj0431019j8p0hgn \ 8 intel ssd dc p4510 series phlj106500018p0hgn capacity 8001 56 gb devicestatus healthy firmwareupdateavailable firmware=vdv10170 bootloader=vb1b0172 index 8 serialnumber phlj106500018p0hgn 2\ execute this command for every drive in the previous step that indicates an old firmware in the firmwareupdateavailable field replace \<drive index> with the number shown in the index field from the output of the previous command for all drives that require the update sudo intelmas load intelssd \<drive index> 3\ ensure that the output shows firmware updated successfully this is the sample log output for the upgrade \ intelmas load intelssd 8 warning! you have selected to update the drives firmware! proceed with the update? (y|n) y checking for firmware update \ intel ssd dc p4510 series phlj106500018p0hgn status firmware updated successfully please reboot the system 4\ after you upgrade all drives that require the firmware upgrade, reboot the node using the reboot command after the node completes the reboot, log in to the node as an administrator 5\ attach the data drives to the nvme driver using this command sudo /opt/ocient/scripts/nvme driver util sh bind nvme 6\ compare the output of this command with the output of the same command captured earlier in this process ensure that the reported drive status and capacity are the same as before the upgrade and that the firmware shown is current sudo intelmas show d index, serialnumber, devicestatus, capacity, firmwareupdateavailable intelssd 7\ reattach the data drives to the uio driver sudo /opt/ocient/scripts/nvme driver util sh bind uio enable the rolehostd process and perform final checks 1\ enable the rolehostd process to start after reboot and then start the process sudo systemctl enable rolehostd && sudo systemctl start rolehostd 2\ ensure that the rolehostd process is enabled and started sudo systemctl status rolehostd 3\ if you upgrade the firmware of the drive on a foundation node or loader node, ensure that the output of this command matches the output of the same command captured earlier in this process curl s http //\<ip address of the node> 9090/v1/stats | jq r ' \[] | select ( name == "localstorageservice device status" ) | "\\( device) \\( value)" ' alternatively, you can run a query on the sql node to capture the individual device status and firmware version ensure that the drives and statuses in this query match the results from the query you executed earlier in this process after you replace the node name \<node name> with the name of your node, connect to any sql node on the system and execute this query select s id as serial number, s pci address, s device status, s firmware version from sys nodes n join sys storage device status s on n id = s node id where n name = '\<node name>'; sample output serial number |pci address |device status |firmware version \ + + + cvmd6356009m800ggn|0000 3b 00 0|active |vdv10184 cvmd636400bz800ggn|0000 3c 00 0|active |vdv10184 cvmd636400ja800ggn|0000 5e 00 0|active |vdv10184 cvmd6365003c800ggn|0000 5f 00 0|active |vdv10184 cvmd6365003p800ggn|0000 86 00 0|active |vdv10184 cvmd63650092800ggn|0000 87 00 0|active |vdv10184 phlj0431019j8p0hgn|0000 88 00 0|active |vdv10184 phlj106500018p0hgn|0000 89 00 0|active |vdv10184 4\ if you upgrade the firmware of the drive on a sql node, ensure that the output of the df command shows the same mount points as those captured earlier in this process related links system catalog