Database Administration
...
Table Compression Options
Global Dictionary Compression
global dictionary compression (gdc) is a feature of the database that compresses variable length column data using a dictionary encoder instead of storing the variable length data directly on disk, gdc seamlessly substitutes an integer that corresponds to each unique string advantages of gdc reduced disk usage — a string like "gdc is really cool!" takes up 19 raw bytes with gdc, it takes at most 4 bytes this benefit is multiplied by the number of rows in the table and, when applied to arrays, can lead to a dramatic reduction in required storage fixed width — variable length column data can be more difficult for performing selective i/o it is much faster for {{ocient}} to read the 100th element in a list of integers by offsetting 100 times the fixed width of the integer with variable length types, a full traversal is required, counting how many elements are passed until the 100th is found this can improve performance in some queries faster joins — when joining on a gdc column, equality comparisons can be performed on the gdc integers instead of the variable length data this can improve performance in some queries allows variable length cluster keys — as of version 19 0 of the ocient system, gdc is the only method by which a variable length column can be used as a cluster key index disadvantages of gdc increased complexity — gdc does add some configuration complexity it requires a user to know the rough cardinality of their data to size the gdc number of bytes and adds system configuration options load complexity — gdc compression adds some overhead to loading data into the ocient system not suitable for high cardinality data — if the variable length column has higher cardinality than this, it should not be stored using gdc using gdc to create a gdc column, add the compression gdc(int) parameter after the column type specifier when creating a column, where int is one of 1, 2, or 4 the integer specifies the number of bytes to be used for the integer keys stored on the disk this also corresponds to the maximum number of unique keys to be stored in a particular column gdc(int) maximum number of unique values gdc(1) 256 gdc(2) 65,536 gdc(4) 4,294,967,296 there is a soft limit of 1,000,000 keys even on 4 byte integer gdc columns contact ocient support to evaluate your criteria for changing the limit and to understand the impact of raising limits see docid\ gbrio6u37loupxxusodyb for details applying gdc to a column to enable gdc on a column, the keyword is applied in the create table or alter table add column statement an example of gdc on a varchar column {column name} varchar(255) compression gdc(2) reusing an existing gdc map a user can also specify that a column should share gdc space with another column–possibly even a column of a different table this would be useful if the same data is used in multiple columns and it is often joined in queries use compression gdc existing schema table column name as shown here {column name} varchar(255) gdc existing {schema table column name} gdc on array columns for arrays of variable length data, gdc operates on the individual elements of the array specify compression after the overall array type {column name} varchar(255)\[] compression gdc(2) gdc on tuple columns elements of tuple columns can be compressed with gdc specify compression on the specific type to be compressed {column name} tuple\<int, varchar(255) compression gdc(2)> gdc in the system catalog you can use the system catalog to inspect the number of keys used by different columns and the maximum count on each example query select t id as table id, t name as table name, c id as column id, c name as column name, g compressed size, g current count, g max count from sys tables t inner join sys global map table info g on t id = g table id inner join sys columns c on c id = g column id; truncation when a table with gdc is partially truncated, gdc key mappings for removed rows are not removed this can result in stale mappings the only way to remove unwanted mappings is to drop the column and recreate it gdc column representation in the system catalog to the end user, a table with gdc columns looks like any other table however, when creating a table with gdc columns, the table configuration is different in system catalog tables instead of a table named schema tablename , gdc tables leverage a built in view this view and several pieces of metadata will be created in the system catalog a view named schema tablename is added in sys views this view is the representation of the user of the gdc table, and it automatically converts the gdc keys to the loaded variable length data this is the table that the user interacts with for querying data, making alterations, and granting or revoking access for each gdc column in the table, a table named syslookup schema tablename columnname is added in sys tables these tables store the mappings from strings to integers for each column these tables should be interacted with only rarely a table called sysgdc schema tablename is added in sys tables this is the table that is stored to disk, including all the non gdc columns and the integers for each gdc column this table should be interacted with only rarely gdc columns and loading into the database gdc stores key mappings in a raft consensus log that is maintained by all nodes operating with the admin role these mappings are created as the data is loaded, and used in queries to fetch the correct variable length data for the stored integers mappings can also be created by certain queries if the key does not already exist because the admin roles are responsible for maintaining the gdc key mappings, there must be a consensus of admin roles available on the system in order to load new data that includes gdc columns related links https //docs ocient com/system catalog