Database Administration
Schema Design
Table Compression Options
the {{ocient}} system provides a range of options for compressing tables to reduce storage requirements for your data compression can be a major component of query optimization when used effectively by reducing i/o requirements and potentially allowing more system space for indexes ocient compression key concepts table columns in ocient can support these compression options table compression options docid\ lkvqehfar8zfd hcnlo0u table compression options docid\ lkvqehfar8zfd hcnlo0u table compression options docid\ lkvqehfar8zfd hcnlo0u ocient system compression adheres to these key concepts by default, the ocient dynamic compression scheme can determine whether to apply compression on each column if it is statistically advantageous sometimes, you might want to use another compression scheme like zstd or completely disable compression you can specify custom compression schemes on individual columns through data definition language (ddl) commands, either using a create table or alter table sql statement gdc can stack onto other compression schemes general compression matrix this table shows the best compression scheme for different column data types the suggested best compression schemes consider the column data type and cardinality, but other factors can influence which data compression makes sense the default compression in the table shows the compression scheme that the database applies if you do not specify an option this default compression balances performance and compression data type default compression best compression fixed length compression dynamic compression zstd variable length and low cardinality compression none compression gdc and compression zstd variable length and high cardinality compression none compression zstd arrays compression none compression zstd arrays of variable length types and low cardinality compression none compression gdc and compression zstd tuples compression depends on each tuple index data type the database treats e ach tuple index position as its own column value and can support its own compression scheme fixed length values default to compression dynamic , while variable length values default to compression none for best performance, apply rules for each tuple index based on the data type rules outlined in this table for example my tuple tuple<\<int, varchar(255) compression gdc(1) compression zstd, varchar(255)>> this example defines a tuple column with gdc and zstd compression only for the first inner varchar value for details about the advantages and disadvantages of each compression scheme, see the compression scheme description in database, tables, views, and indexes docid\ ejutg6wjnk5eg55kizq8d for details about the compression syntax, see database, tables, views, and indexes docid\ ejutg6wjnk5eg55kizq8d global dictionary compression column parameter compression gdc gdc uses a dictionary encoder to map variable length column data instead of storing the variable length data directly on disk, gdc substitutes each unique string with a corresponding unique integer this compression scheme can significantly reduce overall storage requirements for low cardinality columns with a few distinct values while boosting performance for various operations gdc supports an optional compression value argument that specifies how many unique values to use advantages reduced disk usage — variable length data such as strings or container types can experience massive storage reduction improved query performance — by mapping each row to an integer, variable length rows no longer require a full traversal for selective i/o operations, which improves performance in some queries faster joins — join comparisons have improved performance from using gdc integers instead of the variable length data variable length cluster keys — variable length columns can use a clustering key index for improved performance stackable — columns can use gdc in tandem with other compression schemes disadvantages increased complexity — you must know the rough cardinality of your data to size the gdc configuration properly load complexity — gdc compression adds some overhead to loading data into the ocient system not suitable for high cardinality data — gdc performance decreases as the data cardinality increases cannot apply retroactively — unlike other compression options, you must set gdc as part of the initial create table sql statement you cannot apply it later to the table for details, see global dictionary compression docid\ fen6iivd8dyogczsgl4zn column name \[data type] compression gdc \[ (compression value) ] for details about adding gdc compression to a table schema, see database, tables, views, and indexes docid\ ejutg6wjnk5eg55kizq8d zstandard compression column parameter compression zstd zstd uses the zstd library to apply a fast and lossless compression algorithm to any column type, including fixed and variable length scalars, arrays, and tuples on a per component basis in addition, zstd works on columns where the database first compresses variable length data by gdc and then compresses the resulting integral type on disk this compression scheme can affect performance by requiring additional processing therefore, this compression scheme is usually good for less frequently accessed columns or columns paired with an index zstd supports these optional values compression level — this value signifies how much compression the data receives the default value is 0 the full range of values is from 7 (low compression) to 15 (high compression) the database requires less memory when this value is lower, whereas it requires more memory and processing resources when the value is larger dictionary size — this value is the dictionary size that you specify as a positive integer the value signifies the size of the shared compression dictionary in bytes the default value is 32768 (32 kb) the full range of values ranges from 4096 (4 kb) to 1048576 (1 mb) this value denotes the amount of memory consumed during segment generation in general, larger values provide better compression but use more memory the default values for compression level and dictionary size are sufficient for most cases advantages reduced disk storage — d ramatically decreases disk space usage flexible — you can apply this compression scheme after table creation using an database, tables, views, and indexes docid\ ejutg6wjnk5eg55kizq8d sql statement disadvantages performance loss — zstd does require some extra cpu resources, which means that it can impact query performance memory usage — zstd can demand significant memory usage when many columns use it for parallel query workloads column name \[data type] compression zstd \[compression level = level value, dictionary size = size value] for details about adding zstd compression to a table schema, see database, tables, views, and indexes docid\ ejutg6wjnk5eg55kizq8d dynamic compression column parameter compression dynamic by default, ocient tables use dynamic compression to reduce storage space requirements automatically this method employs a variety of compression schemes on columns based on their data type and a statistical determination of what kinds of compression would be advantageous for fixed length data, block size units (typically 4 kb) can receive various compression schemes, including run length encoding and delta delta encoding for variable length data (varchar and varbinary), the database compresses data row by row, with values individually compressed using lz4 compression this method produces a good compression ratio for large values however, small or medium sized values have negligible compression rates advantages ease of use — the ocient system determines the compression flexibility — you can apply this compression scheme after table creation using an database, tables, views, and indexes docid\ ejutg6wjnk5eg55kizq8d sql statement disadvantages secondary option — dynamic compression is designed for balance, but it is generally not as effective at compression as zstd performance loss — this compression scheme has a s mall performance impact, but is not as costly as zstd column name \[data type] compression dynamic for details about adding dynamic compression to a table definition, see database, tables, views, and indexes docid\ ejutg6wjnk5eg55kizq8d for examples of tables that use compression, see create table sql statement examples docid\ ejlruzsfc1ynaauqadx8i related links global dictionary compression docid\ fen6iivd8dyogczsgl4zn secondary indexes docid\ pvmvtea poabqrd9q vkc