Data Segment Statuses
You can check the status of your system segment groups by querying thesys.segment_groups system catalog table. For details, see System Catalog.
This table describes the states for data segments.
| Status | Description | Recovery Process |
|---|---|---|
| INTACT | The normal and operable status. | No recovery needed. |
| DAMAGED | The segment failed a checksum, meaning the segment data is corrupted and unusable. | If you have sufficient parity width, you can recover a damaged segment by invoking a rebuild segment task. |
| MISSING | The segment is on a node or disk that is currently offline. If the node or disk rejoins the cluster, it can transition to the INTACT status. | When a disk or Foundation Node is permanently removed, you can perform a rebuild task to recover the data. This requires sufficient parity width. |
| REBUILDING | The segment is in recovery after damage or missing segment data. | Recovery is already in progress. |
Recovery Considerations
If a segment has the DAMAGED or MISSING status, queries can proceed by reconstructing the data on demand using the remaining erasure-coded data in the segment group. Having a non-INTACT status means that input or output (I/O) performance is significantly reduced. To restore full performance, you need to run a segment rebuild task. Segment rebuilding is not automatic. The system administrator must manually invoke it. A segment rebuild fails, and data is lost completely if the number of segments with the DAMAGED status in a segment group exceeds the parity width of the storage space. To avoid data loss:- Provision the parity width of the storage space at or above the number of expected concurrent node failures.
- Rebuild damaged or permanently missing segments as soon as possible.
Checking for Abnormal Segments
You can find any segments that need a rebuild by querying for segment groups with an abnormal status. Examples Finding Faulty Segment Groups This example query finds any segment groups with the DAMAGED or MISSING status.SQL
Text
SQL
Text
Starting a Segment Rebuild Task
A user with System Administrator privileges can start a segment rebuild using theCREATE TASK TYPE REBUILD SQL statement. You cannot cancel a segment rebuild task after it is started.
Most commonly, a rebuild task repairs all damaged or missing segments across the system.
Example
Create a rebuild task.
SQL
Advanced Rebuild Commands
Rebuild tasks can also execute on specific Foundation Nodes or clusters. For information on fine-tuning rebuild tasks, see CREATE TASK.Checking Rebuild Task Status
Monitor the status of current and past segment rebuild tasks from thesys.subtasks system catalog table. For details, see System Catalog.
SQL
| Status | Status Detail | Description | Next Steps |
|---|---|---|---|
| complete | no_work | The segment groups were already available and healthy. No rebuilding was needed. | None |
| complete | complete | Rebuild completed successfully. | None |
| running | rebuild_in_progress | Rebuild is in process. | You can monitor the progress of the rebuild task by checking the JSON dictionary in the details column of the sys.subtasks system catalog table. |
| failed | rebuild_not_possible | The number of missing or damaged segments exceeds the parity width, and the data cannot be recovered currently. | If missing segments are available on an offline drive, you can attempt another rebuild task when that drive is made available to the system. Otherwise, you cannot recover the data. |
| failed | rebuild_no_space | There is not enough space available to rebuild the segment. | You can complete the rebuild by truncating other data to free up space. |
| failed | failed_on_node | The cluster lost its connection to the node before the rebuild was completed. | This is a transient error. You can retry the rebuild task. |
| failed | error | An unexpected internal error occurred. | Review error message details using the rolehostd logs. For details, see Log Monitoring. |

