System Administration
Maintenance Overview
Guide to Rebuilding Segments
data in an {{ocient}} system is erasure coded to provide resilience to disk and foundation node failures the resilience of a system depends on the parity width of the storage space, which represents the number of associated disks (or nodes) that can fail without interruption of service or data loss for details about storage spaces and parity width, see docid\ xoedifmbj5 tpw97ku85q data segment statuses you can check the status of your system segment groups by querying the sys segment groups system catalog table for details, see docid\ tbjvq0xtd tcxq17hm nc this table describes the states for data segments true 118,236,100 unhandled content type unhandled content type unhandled content type unhandled content type unhandled content type unhandled content type unhandled content type unhandled content type unhandled content type unhandled content type unhandled content type unhandled content type unhandled content type unhandled content type unhandled content type recovery considerations if a segment has the damaged or missing status, queries can proceed by reconstructing the data on demand using the remaining erasure coded data in the segment group having a non intact status means that input or output (i/o) performance is significantly reduced to restore full performance, you need to run a segment rebuild task segment rebuilding is not automatic the system administrator must manually invoke it a segment rebuild fails, and data is lost completely if the number of segments with the damaged status in a segment group exceeds the parity width of the storage space to avoid data loss provision the parity width of the storage space at or above the number of expected concurrent node failures rebuild damaged or permanently missing segments as soon as possible checking for abnormal segments you can find any segments that need a rebuild by querying for segment groups with an abnormal status examples finding faulty segment groups this example query finds any segment groups with the damaged or missing status select from sys segment groups where status in ('damaged', 'missing'); output \| "id" | "cluster id" | "segment type" | "status" | "primary owner" | "loader id" | "table id" | "scope id" | "block size" | "begin time" | "end time" | "coding algorithm" | "coding block size" | "coding threshold" | "coding width" | "replication" | "parity cycle" | "created time" | "rolehostd version" | "commit hash" | "timestamp" | "build user" | "depth" | "removal type" | \| | | | | | | | | | | | | | | | | | | | | | | | | \| "720576045199095878" | "074c32ec 4f92 4718 ada1 5eb55eafdcb3" | tkt segment | damaged | | "9a8f7ea7 613c 499f ab2d 937ab0ce992e" | bd18d33b aae9 4be2 a2f4 76ecdc34c2ba | "8bc119d2 875a 47e4 b016 87fde83e77d6" | 4096 | 0 | 1 | xor parity | 4096 | 2 | 3 | 1 | 1 | 2025 02 26 20 23 34 274 | "25 0 0" | "91ab76ae57491ade0122bc7b594d5c3c6e0bf40c" | "20250109 221519" | | 0 | not removed | \| "716072445571725367" | "074c32ec 4f92 4718 ada1 5eb55eafdcb3" | tkt segment | damaged | | "9a8f7ea7 613c 499f ab2d 937ab0ce992e" | "9fe79009 ee75 4429 b29b 3a614a166751" | a8c805d0 2144 46a7 8374 52cb88d64244 | 4096 | 0 | 1 | xor parity | 4096 | 2 | 3 | 1 | 1 | 2025 02 26 20 23 19 447 | "25 0 0" | "91ab76ae57491ade0122bc7b594d5c3c6e0bf40c" | "20250109 221519" | | 0 | not removed | finding clusters and nodes with faulty segment groups inspect the count of damaged groups by cluster and node select c name as cluster name, n name as node name, g status as segment group status, seg status as segment status, seg kind, count( ) as segment count from sys segment groups g left join sys clusters c on c id = g cluster id left join sys stored segments seg on seg segment group id = g id left join sys nodes n on n id = seg node id where g status <> 'intact' and (seg status <> 'intact' or seg status is null) group by 1,2,3,4,5 order by 1,2,3,4,5; output \| "cluster name" | "node name" | "segment group status" | "segment status" | "kind" | "segment count" | \| | | | | | | \| foundation cluster | foundation0 | damaged | | virtual | 2 | starting a segment rebuild task a user with system administrator privileges can start a segment rebuild using the create task type rebuild sql statement you cannot cancel a segment rebuild task after it is started most commonly, a rebuild task repairs all damaged or missing segments across the system example create a rebuild task create task type rebuild; the system can continue to perform queries while rebuilding segments, but the process can impact i/o performance advanced rebuild commands rebuild tasks can also execute on specific foundation nodes or clusters for information on fine tuning rebuild tasks, see docid\ lp3hitukekekfy1vmalei checking rebuild task status monitor the status of current and past segment rebuild tasks from the sys subtasks system catalog table for details, see docid\ tbjvq0xtd tcxq17hm nc select from sys subtasks where task type = 'rebuild'; this table describes the statuses for a rebuild task 87,184,174,181 trueleft unhandled content type left unhandled content type left unhandled content type left unhandled content type left unhandled content type left unhandled content type left unhandled content type left unhandled content type left unhandled content type left unhandled content type left unhandled content type left unhandled content type left unhandled content type left unhandled content type left unhandled content type unhandled content type left unhandled content type left unhandled content type left unhandled content type left unhandled content type left unhandled content type left unhandled content type left unhandled content type left unhandled content type left unhandled content type left unhandled content type left unhandled content type left unhandled content type left unhandled content type left unhandled content type left unhandled content type unhandled content type related links docid\ lp3hitukekekfy1vmalei docid\ wswicvbpsptbnvj6kb30v