TimeKeys and Clustering Keys

Segment keys define how an ﻿ System partitions data on disk based on particular columns in a database. These keys allow the system to quickly filter rows and skip irrelevant records for queries involving those columns.   
For this reason, segment keys are integral tools for optimizing query performance for pivotal columns. Although segment keys are optional, deploying them on every database table in your system is a good practice because they require practically no additional processing or storage resources. 
Ocient supports two types of segment keys: 
﻿TimeKey®﻿ — A segment key that partitions data based on a single time-series data column.
﻿Clustering Key﻿ — A series of columns that are frequently queried together. The system subdivides these segments on disk for faster reference.
Together, these segment keys form the basis for the Ocient time-cluster storage engine, which subdivides data and metadata for optimal performance. 
Attributes of Segment Keys 
Advantages
Ocient segment keys share these important advantages: 
Improved performance for query filtering and record retrieval. 
Minimal resource demand for storage or I/O.
Easy to use. Without sophisticated system knowledge, you can create segment keys with basic data definition language (DDL) statements.  
Limitations
There are no significant downsides to using segment keys. However, they have some limitations for deployment: 
You can implement segment keys only upon table creation as part of the initial CREATE TABLE﻿ statement. You cannot add them later to a table. 
Each database table can support only one key of each type. For faster performance on additional columns, you can also add clustering indexes and other secondary indexes. For details, see Clustering Indexes﻿ and Secondary Indexes﻿.
Segment keys are restricted to specific column data types. 
TimeKeys
TimeKeys partition data based on a single time-series column in a database table. This column should normally be either a DATE or TIMESTAMP data type. 
For optimal performance, you can configure a TimeKey column with two arguments that comprise a time bucket. The time bucket is a unit of time combined with a number value (e.g., HOUR, 2 or DAY, 1). This value sets the granularity for the partitioning. 
In general, this value should match the level of filtering that you expect your queries to request. For example, a database table you expect to query for transactions by minute should probably have a small TimeKey bucket value, such as an hour. If you expect your queries to filter by broader time values, your table can have a TimeKey bucket of one day or larger.  
The Ocient System uses the TimeKey and the bucket granularity to partition rows into segments that incorporate a minimum and maximum time value for all rows. For queries that involve time filtering, the TimeKey allows an Ocient System to immediately skip data segments that are outside the query bounds without any I/O processing. 
This syntax example creates a single TimeKey column and must be included in the column definition of a CREATE TABLE SQL statement. For the full syntax, see CREATE TABLE﻿.
SQL
column_name [ data_type ] TIME KEY BUCKET (bucket_granularity, bucket_value)
column_name [ data_type ] TIME KEY BUCKET (bucket_granularity, bucket_value)
﻿
Parameter
Type
Description
column_name
string
An identifier for the column. 
data_type
string
Optional.
The data type of the column. 
For the TimeKey column, this column should be DATE or TIMESTAMP type. The TimeKey column can also support INT or BIGINT type, but these data types do not use a bucket_value argument. 
bucket_granularity
numeric
The granularity of the TimeKey column, based on the specified TIME type in bucket_value.
bucket_value
string
The TIME type to parse the TimeKey column. Supported values include: [ DAY, HOUR, MINUTE, SECOND ]
For example, BUCKET(1, DAY) sets the time-bucket granularity to a fixed width of one day. 
﻿
For DDL examples of tables that use TimeKey columns, see CREATE TABLE SQL Statement Examples﻿.
Clustering Key
Every table in an Ocient System can support a Clustering Key to speed up query operations on a series of adjacent columns. 
The Clustering Key determines how data is ordered into segments on disk. This ordering acts like an index, enabling an Ocient System to quickly locate any relevant rows of the Clustering Key columns. 
Every database table should define one Clustering Key on a sequence of frequently queried columns for best results. 
Clustering Key columns can be of any fixed-length data type except for DECIMAL. Variable-length columns can also support Clustering Keys if they are configured with global dictionary compression. For details, see Global Dictionary Compression﻿. 
In addition to the Clustering Key, you can also define clustering indexes. Unlike the Clustering Key, clustering indexes do not involve ordering data into segments. They create separate indexes for each specified column set, which boosts performance but requires some additional storage. 
This syntax example creates a Clustering Key across multiple columns and must be included in a CREATE TABLE SQL statement. For the full syntax, see CREATE TABLE﻿.
SQL
CLUSTERING KEY key_name (ck_col1, ck_col2 [, ...])
    [, INDEX index_name (idx_col1, idx_col2 [, ...])
    [, INDEX ... ] ]
CLUSTERING KEY key_name (ck_col1, ck_col2 [, ...])
    [, INDEX index_name (idx_col1, idx_col2 [, ...])
    [, INDEX ... ] ]
﻿
Parameter
Type
Description
key_name
string
An identifier for the Clustering Key.  
ck_col1, ck_col2 [, ...]
string
A series of specific columns comprising the Clustering Key. 
Clustering Key columns must not be nullable. Specify  NOT NULL in the column definition for the respective columns. For details, see column definition.
No limit exists on the number of columns for the Clustering Key. 
index_name
string
Optional.
An identifier for a clustering index.  
You must include the definition of columns that comprise the index using the idx_col1, idx_col2 [, ...] parameter.
idx_col1, idx_col2 [, ...]
string
Optional.
A series of specific columns comprising a clustering index. 
You can apply clustering indexes only to columns included in the Clustering Key. Specify any number of columns in any specified order. 
You must include the identifier for the index using the index_name parameter.
﻿
Select Clustering Key Columns
You can define a Clustering Key and any additional clustering indexes as arguments of a CREATE TABLE﻿ SQL statement. 
The column ordering of a clustering key is significant. You must specify columns in rank order.
Table Example
This CREATE TABLE SQL statement example shows the definition of a clustering key on columns (CK1, CK2, CK3). The data types of these columns are irrelevant to this example. 
SQL
CREATE TABLE example_table (
    ...,
    CK1 INT,
    CK2 INT,
    CK3 INT,
    ...,
    CLUSTERING KEY ck_example (CK1, CK2, CK3),
);
CREATE TABLE example_table (
    ...,
    CK1 INT,
    CK2 INT,
    CK3 INT,
    ...,
    CLUSTERING KEY ck_example (CK1, CK2, CK3),
);
﻿
The Clustering Key primes the records for search and filter operations on its columns in descending order of importance. This means the first column CK1 is the most important of the subset while the third CK3 is the least important. 
To boost performance with a Clustering Key, a query must make references to the columns following this key order. 
For a Clustering Key in the (CK1, CK2, CK3) pattern, these sequences of query column references are effective: 
 CK1
(CK1,CK2)
(CK1,CK2,CK3) 
Query Example
This example SELECT query follows the key order for its filter operations. 
SQL
SELECT * FROM example_table (
    WHERE CK1 > 1
    AND CK2 = 0
    AND CK3 < 0;
SELECT * FROM example_table (
    WHERE CK1 > 1
    AND CK2 = 0
    AND CK3 < 0;
﻿
For a Clustering Key in the (CK1, CK2, CK3) pattern, these sequences of query column references are ineffective: 
(CK1, CK3)
(CK2, CK3)
(CK3, CK2)
To improve performance on queries that reference columns in these sequences, you can use clustering indexes. For details, see Adding Clustering Indexes﻿.  
To check the order of column references in a query, you can test a query using an EXPLAIN PIPELINE﻿ SQL statement. This statement shows the I/O pipeline and how the system references indexes. 
For DDL examples of tables that use Clustering Key columns, see CREATE TABLE SQL Statement Examples﻿.
Add Clustering Indexes
A table can only have one Clustering Key, but it can also have any number of clustering indexes to boost performance. Clustering indexes act similarly to the Clustering Key, but they do not order data into segments. Instead, each clustering index creates a separate index for the specified column set, which can impact storage and loading overhead.  
Clustering indexes can provide flexibility to your tables, allowing your queries to reference the same columns as your Clustering Key, but in different sequences. 
You must define clustering keys and clustering indexes when you first create a table. You cannot add them at a later time. 
Example
Define a Clustering Key across the first four columns (CK1, CK2, CK3, CK4) in the example_table table in this CREATE TABLE SQL statement. 
The statement also includes two clustering indexes using the INDEX definition. The first creates an index for a subset of just two columns (CK2, CK3) from the Clustering Key. The second index uses all of the key columns (CK2, CK4, CK1, CK3), but in a different sequence. 
SQL
CREATE TABLE example_table (
    CK1 INT NOT NULL,
    CK2 INT NOT NULL,
    CK3 INT NOT NULL,
    CK4 INT NOT NULL, 
    COL5 INT NOT NULL,
    CLUSTERING KEY ck_example (CK1, CK2, CK3, CK4),
    INDEX idx_01 (CK2, CK3),
    INDEX idx_02 (CK2, CK4, CK1, CK3)
);
CREATE TABLE example_table (
    CK1 INT NOT NULL,
    CK2 INT NOT NULL,
    CK3 INT NOT NULL,
    CK4 INT NOT NULL, 
    COL5 INT NOT NULL,
    CLUSTERING KEY ck_example (CK1, CK2, CK3, CK4),
    INDEX idx_01 (CK2, CK3),
    INDEX idx_02 (CK2, CK4, CK1, CK3)
);
﻿
In the example, COL5 is not part of the Clustering Key, meaning this column cannot be included in any clustering indexes.  
Permissions
To create TimeKeys and Clustering Keys, you must have the CREATE privilege for the current database.
Observability
You can query these system catalog tables to get information on the effectiveness of your configurations for segment keys and indexes: 
﻿sys.indexes — Includes information on the index identifier, type, and other information.  
﻿sys.index_columns — Contains information on indexes by their column position. 
﻿sys.segment_parts — Contains data on the segment parts associated with each segment key. 
For examples of querying segment and index metadata tables, see Discover Insights From System Catalog Tables﻿. 
Related Links
﻿Ingest Data with Legacy LAT Reference﻿﻿
﻿Query Ocient﻿﻿
﻿CREATE TABLE﻿﻿
﻿Secondary Indexes﻿﻿
Related Videos
﻿At the Whiteboard with Ocient: Indexing at Hyperscale﻿

Parameter	Type	Description
column_name	string	An identifier for the column.
data_type	string	Optional. The data type of the column. For the TimeKey column, this column should be DATE or TIMESTAMP type. The TimeKey column can also support INT or BIGINT type, but these data types do not use a bucket_value argument.
bucket_granularity	numeric	The granularity of the TimeKey column, based on the specified TIME type in bucket_value.
bucket_value	string	The TIME type to parse the TimeKey column. Supported values include: [ DAY, HOUR, MINUTE, SECOND ] For example, BUCKET(1, DAY) sets the time-bucket granularity to a fixed width of one day.

Parameter	Type	Description
key_name	string	An identifier for the Clustering Key.
ck_col1, ck_col2 [, ...]	string	A series of specific columns comprising the Clustering Key. Clustering Key columns must not be nullable. Specify NOT NULL in the column definition for the respective columns. For details, see column definition. No limit exists on the number of columns for the Clustering Key.
index_name	string	Optional. An identifier for a clustering index. You must include the definition of columns that comprise the index using the idx_col1, idx_col2 [, ...] parameter.
idx_col1, idx_col2 [, ...]	string	Optional. A series of specific columns comprising a clustering index. You can apply clustering indexes only to columns included in the Clustering Key. Specify any number of columns in any specified order. You must include the identifier for the index using the index_name parameter.

Database Password Security Settings

Secondary Indexes