NVMe Drive Firmware Upgrade Process
The software uses NVMe drives for storage because of their higher I/O throughput than spinning media drives. These drives are referred to as data disks or data drives. When you run Ocient software, the Ocient System accesses these data drives using the UIO driver during normal operation.
This procedure guides a System Administrator through binding the drives back to the NVMe driver and upgrading the drive firmware. The procedure includes steps for upgrading firmware of only Intel® NVMe drives. For other drive models, follow a similar process using the firmware upgrade process from the manufacturer.
Stop the rolehostd process prior to working on the NVMe drives. Otherwise, when the rolehostd process is running, the process becomes unstable because it has open file handles on the drives.
Execution of this process requires the sudo privilege on the Node where the drive firmware is updated.
Perform the firmware upgrade on one node of any given type (Foundation, Loader, SQL) at a time. You can execute the firmware upgrade in parallel on different types of nodes.
Perform the upgrade during a maintenance window because issues with the upgrade might prevent the node from returning to service.
The prerequisites for this procedure are:
- The following packages are installed on the node undergoing firmware upgrade:
- smartmontools
- nvme-cli
- intelmas
- sedutil
- The target firmware version for the upgrade is known and documented.
- If you upgrade the data drive firmware on a Loader Node, prior to beginning the procedure, take the node out of the loading path by updating the LAT configuration. Before you shut down the rolehostd process, drain the pages that are held by the node.
The following example illustrates the firmware update commands used for an Intel NVMe SSD. Please substitute the appropriate command for your drive manufacturer in each of the steps that use the intelmas command.
The following steps require sudo permissions on the node.
1. Log in to the node as an administrator user.
2. If you upgrade the drive firmware on a Foundation Node or Loader Node, save the status of the drives for later in the procedure. After you replace <IP_ADDRESS_OF_THE_NODE> with the IP address of your node, save the output of this command.
Alternatively, you can run a query on the SQL Node to display the status of the individual storage device and firmware version. After you replace <NODE_NAME> with the name of your node, connect to any SQL Node on the system and run this query.
Sample output:
3. If you upgrade the drive firmware on a SQL Node, save the output of the df command.
1. Stop the rolehostd process to ensure that the drive replacement does not impact system stability.
2. Prevent the rolehostd process from starting after reboot.
3. Confirm that the output of this command indicates that the rolehostd process is not running and that the process auto start on reboot setting is disabled.
When you upgrade the drive firmware, you can impact data reliability while the rolehostd process is running on the node.
4. Attach the data drives to the NVMe driver.
5. Ensure that all drives are not the operating system drive and are attached to the NVMe driver in the output of this command. Any drives that did not bind to the NVMe driver in the previous step do not upgrade, so you must investigate any failure to bind to the NVMe driver.
INCORRECT: Eight data drives are attached to the UIO driver. This is the incorrect system state for performing a firmware upgrade.
CORRECT: The data drives are attached to the NVMe driver. This is the correct system state for performing a firmware upgrade.
1. Run this command and review the FirmwareUpdateAvailable field to identify the drives that need a firmware upgrade. Save the results of this step for comparison later in the process.
Drive 8 needs the firmware update but drive 7 does not need the update.
2. Execute this command for every drive in the previous step that indicates an old firmware in the FirmwareUpdateAvailable field. Replace <DRIVE_INDEX> with the number shown in the Index field from the output of the previous command for all drives that require the update.
3. Ensure that the output shows Firmware updated successfully. This is the sample log output for the upgrade.
4. After you upgrade all drives that require the firmware upgrade, reboot the node using the reboot command. After the node completes the reboot, log in to the node as an administrator.
5. Attach the data drives to the NVMe driver using this command.
6. Compare the output of this command with the output of the same command captured earlier in this process. Ensure that the reported drive status and capacity are the same as before the upgrade and that the firmware shown is current.
7. Reattach the data drives to the UIO driver.
1. Enable the rolehostd process to start after reboot and then start the process.
2. Ensure that the rolehostd process is enabled and started.
3. If you upgrade the firmware of the drive on a Foundation Node or Loader Node, ensure that the output of this command matches the output of the same command captured earlier in this process.
Alternatively, you can run a query on the SQL Node to capture the individual device status and firmware version. Ensure that the drives and statuses in this query match the results from the query you executed earlier in this process. After you replace the node name <NODE_NAME> with the name of your node, connect to any SQL Node on the system and execute this query.
Sample output:
4. If you upgrade the firmware of the drive on a SQL Node, ensure that the output of the df command shows the same mount points as those captured earlier in this process.