Analysis in Ocient

Text Analysis in Database Queries

The enables search and analysis of textual data. You can run queries efficiently using indexes in the Ocient Hyperscale Data Warehouse to perform text analysis. The database supports different types of indexes. The N-gram index is a type of secondary index that enables text analysis.

Cluster Key Indexes

Define cluster key (CK) indexes when you create the table.

Index Type

Number of Columns

Column Types

Filters

Primary CK index (always exists)

Multiple

Fixed-length or GDC columns, fixed-length tuple columns

Equality-like, range

Additional CK indexes

Subset of CK in any order





Secondary Indexes

You can create or drop secondary indexes at any time.

Index Type

Number of Columns

Column Types

Filters

Inverted indexes

Single

Fixed-length or GDC column, array, or tuple component

Equality-like, range

Hash indexes



Variable-length column, array, or tuple component

Equality-like

N-gram indexes



VARCHAR column, array, or tuple component

LIKE, equality-like

N-gram indexes support text analysis by providing an efficient search of textual data. This type of index works by tokenizing the pattern in the string, and then pruning and transforming the tokens. You can create N-gram indexes at any time. For details about creating an N-gram index, see CREATE INDEX.

There are other ways to perform text analysis. For details, see the syntax for LIKE and SIMILAR TO.

Related Links