Statistics Cache Management
System administrators can invalidate internal statistics caches used by the SQL optimizer. The optimizer then immediately sends new requests to Foundation Nodes to repopulate statistics for the underlying data in tables.
The System calculates two primary types of statistics. The system calculates Cardinality Distinct Estimates (CDEs) by performing a union operation on the information stored in the table on Foundation Nodes. Also, the system calculates Probability Density Functions (PDFs) by sampling the table data local to the Foundation Node. Then, the system performs a union operation on the CDEs and PDFs on SQL Nodes when stored in the statistics cache. Notably, the union operation for the PDF is not associative and its implementation is not deterministic. The union operation for the CDE is associative, commutative, and deterministic.
The use of these SQL statements in system administration settings is rare. Do not use these statements for day-to-day maintenance of Ocient clusters. Mostly, you can use these commands when large amounts of data have been loaded or removed from the Ocient System in a short time, and the system administrator wants to recalculate statistics immediately rather than wait for the configured statistics cache refresh timer to reflect the changes. These syntaxes help to stabilize the optimizer for new data distributions quickly.
INVALIDATE STATS clears the entire optimizer statistics cache on all SQL Nodes.
If a table has not undergone any row delta after the last statistics cache refresh, the Ocient System does not recalculate statistics by sampling the underlying table. PDF statistics might still change slightly due to non-deterministic union operations. If the table has undergone a row change, the Ocient Sytem recalculates statistics from scratch.
Query performance might be affected until the statistics cache repopulates, specifically for optimization time. For tables that have undergone a row delta, the optimizer temporarily uses the old statistics file again until the recalculated file is available.
Syntax
Example
Clear the entire optimizer statistics cache on all SQL Nodes.
The INVALIDATE STATS FILES SQL statement forcibly invalidates the CDE and PDF files of the Foundation Node for all tables regardless of changes in the row delta, which triggers the system to recalculate the statistics.
The optimizer statistics cache continues to use its current value until the Foundation Nodes have finished recalculating the new statistics, after which the cache replaces the table entry with the new statistics. This operation should have minimal impact on query optimization times.
Depending on the size of the system, the system might take minutes to hours to recalculate all statistics on the system fully and for new statistics to populate in the optimizer cache.
Syntax
Example
Invalidate all statistics
The ALTER table_name INVALIDATE STATS SQL statement forcibly invalidates the CDE and PDF files of the Foundation Node for the specified table regardless of changes in the row delta, which triggers the system to recalculate the statistics.
The optimizer statistics cache continues to use its current value until the Foundation Nodes have finished recalculating the new statistics, after which the cache replaces the table entry with the new statistics. This operation should have minimal impact on query optimization times.
Syntax
Parameter | Data Type | Description |
---|---|---|
table_name | string | The name of the table. |
Example
Force the system to recalculate the statistics for the table named test_table.