TimeKeys and Clustering Keys
Segment keys define how an System partitions data on disk based on particular columns in a database. These keys allow the system to quickly filter rows and skip irrelevant records for queries involving those columns.
For this reason, segment keys are integral tools for optimizing query performance for pivotal columns. Although segment keys are optional, deploying them on every database table in your system is a good practice because they require practically no additional processing or storage resources.
Ocient supports two types of segment keys:
- Clustering Key — A series of columns that are frequently queried together. The system subdivides these segments on disk for faster reference.
Together, these segment keys form the basis for the Ocient time-cluster storage engine, which subdivides data and metadata for optimal performance.
Advantages
Ocient segment keys share these important advantages:
- Improved performance for query filtering and record retrieval.
- Minimal resource demand for storage or I/O.
- Easy to use. Without sophisticated system knowledge, you can create segment keys with basic data definition language (DDL) statements.
Limitations
There are no significant downsides to using segment keys. However, they have some limitations for deployment:
- You can implement segment keys only upon table creation as part of the initial CREATE TABLE statement. You cannot add them later to a table.
- Each database table can support only one key of each type. For faster performance on additional columns, you can also add clustering indexes and other secondary indexes. For details, see Clustering Indexes and Secondary Indexes.
- Segment keys are restricted to specific column data types.
TimeKeys partition data based on a single time-series column in a database table. This column should normally be either a DATE or TIMESTAMP data type.
For optimal performance, you can configure a TimeKey column with two arguments that comprise a time bucket. The time bucket is a unit of time combined with a number value (e.g., HOUR, 2 or DAY, 1). This value sets the granularity for the partitioning.
In general, this value should match the level of filtering that you expect your queries to request. For example, a database table you expect to query for transactions by minute should probably have a small TimeKey bucket value, such as an hour. If you expect your queries to filter by broader time values, your table can have a TimeKey bucket of one day or larger.
The Ocient System uses the TimeKey and the bucket granularity to partition rows into segments that incorporate a minimum and maximum time value for all rows. For queries that involve time filtering, the TimeKey allows an Ocient System to immediately skip data segments that are outside the query bounds without any I/O processing.
This syntax example creates a single TimeKey column and must be included in the column definition of a CREATE TABLE SQL statement. For the full syntax, see CREATE TABLE.
For DDL examples of tables that use TimeKey columns, see CREATE TABLE SQL Statement Examples.
Every table in an Ocient System can support a Clustering Key to speed up query operations on a series of adjacent columns.
The Clustering Key determines how data is ordered into segments on disk. This ordering acts like an index, enabling an Ocient System to quickly locate any relevant rows of the Clustering Key columns.
Every database table should define one Clustering Key on a sequence of frequently queried columns for best results.
Clustering Key columns can be of any fixed-length data type except for DECIMAL. Variable-length columns can also support Clustering Keys if they are configured with global dictionary compression. For details, see Global Dictionary Compression.
In addition to the Clustering Key, you can also define clustering indexes. Unlike the Clustering Key, clustering indexes do not involve ordering data into segments. They create separate indexes for each specified column set, which boosts performance but requires some additional storage.
This syntax example creates a Clustering Key across multiple columns and must be included in a CREATE TABLE SQL statement. For the full syntax, see CREATE TABLE.
You can define a Clustering Key and any additional clustering indexes as arguments of a CREATE TABLE SQL statement.
The column ordering of a clustering key is significant. You must specify columns in rank order.
Table Example
This CREATE TABLE SQL statement example shows the definition of a clustering key on columns (CK1, CK2, CK3). The data types of these columns are irrelevant to this example.
The Clustering Key primes the records for search and filter operations on its columns in descending order of importance. This means the first column CK1 is the most important of the subset while the third CK3 is the least important.
To boost performance with a Clustering Key, a query must make references to the columns following this key order.
For a Clustering Key in the (CK1, CK2, CK3) pattern, these sequences of query column references are effective:
- CK1
- (CK1,CK2)
- (CK1,CK2,CK3)
Query Example
This example SELECT query follows the key order for its filter operations.
For a Clustering Key in the (CK1, CK2, CK3) pattern, these sequences of query column references are ineffective:
- (CK1, CK3)
- (CK2, CK3)
- (CK3, CK2)
To improve performance on queries that reference columns in these sequences, you can use clustering indexes. For details, see Adding Clustering Indexes.
To check the order of column references in a query, you can test a query using an EXPLAIN PIPELINE SQL statement. This statement shows the I/O pipeline and how the system references indexes.
For DDL examples of tables that use Clustering Key columns, see CREATE TABLE SQL Statement Examples.
A table can only have one Clustering Key, but it can also have any number of clustering indexes to boost performance. Clustering indexes act similarly to the Clustering Key, but they do not order data into segments. Instead, each clustering index creates a separate index for the specified column set, which can impact storage and loading overhead.
Clustering indexes can provide flexibility to your tables, allowing your queries to reference the same columns as your Clustering Key, but in different sequences.
You must define clustering keys and clustering indexes when you first create a table. You cannot add them at a later time.
Example
Define a Clustering Key across the first four columns (CK1, CK2, CK3, CK4) in the example_table table in this CREATE TABLE SQL statement.
The statement also includes two clustering indexes using the INDEX definition. The first creates an index for a subset of just two columns (CK2, CK3) from the Clustering Key. The second index uses all of the key columns (CK2, CK4, CK1, CK3), but in a different sequence.
In the example, COL5 is not part of the Clustering Key, meaning this column cannot be included in any clustering indexes.
To create TimeKeys and Clustering Keys, you must have the CREATE privilege for the current database.
You can query these system catalog tables to get information on the effectiveness of your configurations for segment keys and indexes:
For examples of querying segment and index metadata tables, see Discover Insights From System Catalog Tables.
Query Ocient
CREATE TABLE