Global Dictionary Compression (GDC) is a feature of the database that compresses variable length column data using a dictionary encoder. Instead of storing the variable length data directly on disk, GDC seamlessly substitutes an integer that corresponds to each unique string.Documentation Index
Fetch the complete documentation index at: https://docs.ocient.com/llms.txt
Use this file to discover all available pages before exploring further.
Advantages of GDC
- Reduced disk usage — A string like
"GDC is really cool!"takes up 19 raw bytes. With GDC, it takes at most 4 bytes. This benefit is multiplied by the number of rows in the table and, when applied to arrays, can lead to a dramatic reduction in required storage. - Fixed-width — Variable length column data can be more difficult to perform selective input or output (I/O). It is much faster for to read the 100th element in a list of integers by offsetting 100 times the fixed width of the integer. With variable-length types, a full traversal is required, counting how many elements are passed until the 100th is found. This can improve performance in some queries.
- Faster Joins — When joining on a GDC column, equality comparisons can be performed on the GDC integers instead of the variable-length data. This can improve performance in some queries.
- Allows Variable-Length Cluster Keys — As of version 19.0 of the Ocient System, GDC is the only method by which a variable-length column can be used as a cluster key index.
Disadvantages of GDC
- Increased complexity — GDC does add some configuration complexity. It requires a user to know the rough cardinality of their data to size the GDC number of bytes and adds system configuration options.
- Load complexity — GDC compression adds some overhead to loading data into the Ocient System.
- Not suitable for high-cardinality data — If the variable length column has higher cardinality than this, it should not be stored using GDC.
GDC Syntax
To create a GDC column, add theCOMPRESSION GDC(int) parameter after the column type specifier when creating a column, where int is one of 1, 2, or 4. The integer specifies the number of bytes to be used for the integer keys stored on the disk. This also corresponds to the maximum number of unique keys to be stored in a particular column.
| GDC(int) | Maximum Number of Unique Values |
|---|---|
| GDC(1) | 256 |
| GDC(2) | 65,536 |
| GDC(4) | 4,294,967,296 |
There is a soft limit of 1,000,000 keys even on 4-byte integer GDC columns. Contact Ocient Support to evaluate your criteria for changing the limit and to understand the impact of raising limits. See Column Limit for details.
Apply GDC to a Column
To enable GDC on a column, the keyword is applied in theCREATE TABLE or ALTER TABLE ADD COLUMN statement. An example of GDC on a VARCHAR column:
{column_name} VARCHAR(255) COMPRESSION GDC(2)
When you apply GDC to one column, the Ocient System creates a view instead of a table from the full column definition.
Reuse an Existing GDC Map
A user can also specify that a column should share GDC space with another column–possibly even a column of a different table. This would be useful if the same data is used in multiple columns and it is often joined in queries. UseCOMPRESSION GDC EXISTING schema.table.column_name as shown here:
{column_name} VARCHAR(255) GDC EXISTING {schema.table.column_name}
GDC on Array Columns
For arrays of variable-length data, GDC operates on the individual elements of the array. Specify compression after the overall array type:{column_name} VARCHAR(255)[] COMPRESSION GDC(2)
GDC on Tuple Columns
Elements of tuple columns can be compressed with GDC. Specify compression on the specific type to be compressed:{column_name} tuple<int, VARCHAR(255) COMPRESSION GDC(2)>
GDC in the System Catalog
You can use the system catalog to inspect the number of keys used by different columns and the maximum count on each. Example Query:SQL
Truncation
When a table with GDC is partially truncated, GDC key mappings for removed rows are not removed. This can result in stale mappings. The only way to remove unwanted mappings is to drop the column and recreate it.GDC Column Representation in the System Catalog
To the end user, a table with GDC columns looks like any other table. However, when creating a table with GDC columns, the table configuration is different in system catalog tables. Instead of a table namedschema.tablename, GDC tables leverage a built-in view.
This view and several pieces of metadata will be created in the system catalog:
- A view named
schema.tablenameis added insys.views. This view is the representation of the user of the GDC table, and it automatically converts the GDC keys to the loaded variable-length data. This is the table that the user interacts with for querying data, making alterations, and granting or revoking access. - For each GDC column in the table, a table named
syslookup.schema_tablename_columnnameis added insys.tables. These tables store the mappings from strings to integers for each column. These tables should be interacted with only rarely. - A table called
sysgdc.schema_tablenameis added insys.tables. This is the table that is stored to disk, including all the non-GDC columns and the integers for each GDC column. This table should be interacted with only rarely.

