Text Analysis in Database Queries
The enables search and analysis of textual data. You can run queries efficiently using indexes in the Ocient Hyperscale Data Warehouse to perform text analysis. The database supports different types of indexes. The N-gram index is a type of secondary index that enables text analysis.
Cluster Key Indexes
Define cluster key (CK) indexes when you create the table.
Index Type | Number of Columns | Column Types | Filters |
---|---|---|---|
Primary CK index (always exists) | Multiple | Fixed-length or GDC columns, fixed-length tuple columns | Equality-like, range |
Additional CK indexes | Subset of CK in any order | | |
Secondary Indexes
You can create or drop secondary indexes at any time.
Index Type | Number of Columns | Column Types | Filters |
---|---|---|---|
Inverted indexes | Single | Fixed-length or GDC column, array, or tuple component | Equality-like, range |
Hash indexes | | Variable-length column, array, or tuple component | Equality-like |
N-gram indexes | | VARCHAR column, array, or tuple component | LIKE, equality-like |
N-gram indexes support text analysis by providing an efficient search of textual data. This type of index works by tokenizing the pattern in the string, and then pruning and transforming the tokens. You can create N-gram indexes at any time. For details about creating an N-gram index, see CREATE INDEX.
There are other ways to perform text analysis. For details, see the syntax for LIKE and SIMILAR TO.