Database Administration
Schema Design

Table Compression Options

The

 System provides a range of options for compressing tables to reduce storage requirements for your data. Compression can be a major component of query optimization when used effectively by reducing I/O requirements and potentially allowing more system space for indexes.

Ocient Compression Key Concepts

Table columns in Ocient can support these compression options:

Ocient System compression adheres to these key concepts:

  • By default, the Ocient dynamic compression scheme can determine whether to apply compression on each column if it is statistically advantageous. Sometimes, you might want to use another compression scheme like ZSTD or completely disable compression.
  • You can specify custom compression schemes on individual columns through data definition language (DDL) commands, either using a CREATE TABLE or ALTER TABLE SQL statement.
  • GDC can stack onto other compression schemes.

General Compression Matrix

This table shows the best compression scheme for different column data types. The suggested best compression schemes consider the column data type and cardinality, but other factors can influence which data compression makes sense.

The default compression in the table shows the compression scheme that the database applies if you do not specify an option. This default compression balances performance and compression.

Data Type

Default Compression

Best Compression

Fixed-length

COMPRESSION DYNAMIC

COMPRESSION ZSTD

Variable-length and low cardinality

COMPRESSION DYNAMIC

COMPRESSION GDC and COMPRESSION ZSTD

Variable-length and high cardinality

COMPRESSION DYNAMIC

COMPRESSION ZSTD

Arrays

COMPRESSION NONE

COMPRESSION ZSTD

Arrays of variable-length types and low cardinality

COMPRESSION NONE

COMPRESSION GDC and COMPRESSION ZSTD

Tuples

COMPRESSION DYNAMIC

The database treats each tuple index position as its own column value and can support its own compression scheme. For best performance, apply rules for each tuple index based on the data-type rules outlined in this table.

For example:

my_tuple TUPLE<<INT, VARCHAR(255) COMPRESSION GDC(1) COMPRESSION ZSTD, VARCHAR(255)>>

This example defines a tuple column with GDC and ZSTD compression only for the first inner VARCHAR value.

For details about the advantages and disadvantages of each compression scheme, see the compression_scheme description in Column Constraints.

For details about the compression syntax, see CREATE TABLE.

Global Dictionary Compression

Column parameter: COMPRESSION GDC

GDC uses a dictionary encoder to map variable-length column data. Instead of storing the variable length data directly on disk, GDC substitutes each unique string with a corresponding unique integer.

This compression scheme can significantly reduce overall storage requirements for low-cardinality columns with a few distinct values while boosting performance for various operations.

GDC supports an optional compression_value argument that specifies how many unique values to use.

Advantages

  • Reduced disk usage Variable-length data such as strings or container types can experience massive storage reduction.
  • Improved query performance By mapping each row to an integer, variable-length rows no longer require a full traversal for selective I/O operations, which improves performance in some queries.
  • Faster Joins Join comparisons have improved performance from using GDC integers instead of the variable-length data.
  • Variable-Length Cluster Keys Variable-length columns can use a clustering key index for improved performance.
  • Stackable Columns can use GDC in tandem with other compression schemes.

Disadvantages

  • Increased complexity You must know the rough cardinality of your data to size the GDC configuration properly.
  • Load complexity GDC compression adds some overhead to loading data into the Ocient System.
  • Not suitable for high-cardinality data GDC performance decreases as the data cardinality increases.
  • Cannot apply retroactively Unlike other compression options, you must set GDC as part of the initial CREATE TABLE SQL statement. You cannot apply it later to the table.

For more information, see Global Dictionary Compression.

Syntax

SQL


For more information on adding GDC compression to a table schema, see CREATE TABLE.

Zstandard Compression

Column parameter: COMPRESSION ZSTD

ZSTD uses the ZSTD library to apply a fast and lossless compression algorithm to any column type including fixed and variable-length scalars, arrays, and tuples on a per-component basis. In addition, ZSTD works on columns where the database first compresses variable-length data by GDC and then compresses the resulting integral type on disk. 

This compression scheme can affect performance by requiring additional processing. Therefore, this compression scheme is usually good for less-frequently accessed columns or columns paired with an index.

ZSTD supports these optional values:

  • COMPRESSION_LEVEL — This value signifies how much compression the data receives. The default value is 0. The full range of values is from -7 (low compression) to 15 (high compression). The database requires less memory when this value is lower, whereas it requires more memory and processing resources when the value is larger.
  • DICTIONARY_SIZE — This value is the dictionary size that you specify as a positive integer. The value signifies the size of the shared compression dictionary in bytes. The default value is 32768 (32 KB). The full range of values ranges from 4096 (4 KB) to 1048576 (1 MB). This value denotes the amount of memory consumed during segment generation. In general, larger values provide better compression but use more memory.

The default values for COMPRESSION_LEVEL and DICTIONARY_SIZE are sufficient for most cases.

Advantages

  • Reduced disk storage — Dramatically decreases disk space usage.
  • Flexible — You can apply this compression scheme after table creation using an ALTER TABLE SQL statement. 

Disadvantages

  • Performance loss — ZSTD does require some extra CPU resources, which means that it can impact query performance.
  • Memory usage — ZSTD can demand significant memory usage when many columns use it for parallel query workloads.

Syntax

SQL


For more information on adding ZSTD compression to a table schema, see CREATE TABLE.

Dynamic Compression

Column parameter: COMPRESSION DYNAMIC

By default, Ocient tables use dynamic compression to reduce storage space requirements automatically. This method employs a variety of compression schemes on columns based on their data type and a statistical determination of what kinds of compression would be advantageous.

  • For fixed-length data, block-size units (typically 4 KB) can receive various compression schemes, including run-length encoding and delta-delta encoding.
  • For variable-length data (VARCHAR and VARBINARY), the database compresses data row-by-row, with values individually compressed using LZ4 compression. This method produces a good compression ratio for large values. However, small or medium-sized values have negligible compression rates.

Advantages

  • Ease of use — The Ocient System determines the compression.
  • Flexibility — You can apply this compression scheme after table creation using an ALTER TABLE SQL statement.

Disadvantages

  • Secondary option — Dynamic compression is designed for balance, but it is generally not as effective at compression as ZSTD.
  • Performance loss — This compression scheme has a small performance impact, but is not as costly as ZSTD.
SQL


For more information on adding dynamic compression to a table definition, see CREATE TABLE.

For examples of tables that use compression, see CREATE TABLE SQL Statement Examples.

Related Links