Database Administration
Schema Design
CREATE TABLE SQL Statement Examples
tables in the {{ocienthyperscaledatawarehouse}} contain various configuration options that can impact performance, reduce storage requirements, or provide extra data safeguards as some table configuration options cannot be added after data is already loaded, it is important that you take the time to design create table sql statements with forethought for the queries that you expect to execute frequently this topic provides examples of create table sql statements designed for different use cases and describes the configuration choices for syntax and parameter information, see database, tables, views, and indexes docid\ a4jvhkvg31tapexr9zpcq create a table with all data types this example creates a table with all supported data types in {{ocient}} the intent of this example is to show the required formatting for different data type values, which is shown as the default value for each column create table data type example ( col bigint bigint not null default 9876543210, col binary binary(3) not null default '0xabcdef', 	col boolean bool not null default true, col char char(4) not null default 'val', col date date not null default '2000 01 01', col decimal decimal(18,4) not null default 123 45, col double double not null default 3 141592, col float float not null default 2 718, col int int not null default 123456789, col int array int\[] not null default 'int\[0,1,2,3]', col ipv4 ipv4 not null default '127 0 0 1', col ip ip not null default '0123 4567 89ab\ cdef 0123 4567 89ab\ cdef', col smallint smallint not null default 32767, col matrix matrix\[2]\[3] not null default '{ {0, 0, 0}, {0, 0, 0} }', 	col point point not null default 'point(0 0)', 	col linestring linestring not null default 'linestring(0 0, 1 1)', 	col polygon polygon not null default 'polygon((0 0, 0 1, 1 1, 1 0, 0 0))', 	col time time not null default '12 34 56 012345678', 	col timestamp timestamp not null default '2000 01 02 12 34 45', 	col tinyint tinyint not null default 127, 	col tuple tuple<\<int, int>> not null default 'tuple<\<int,int>>(0,0)', 	col array of tuples tuple<\<int, int>>\[] not null default 'tuple<\<int,int>>\[tuple<\<int,int>>(1,2),tuple<\<int,int>>(3,4)]', 	col uuid uuid not null default '01234567 89ab cdef 1357 0123456789ab', 	col varbinary varbinary not null default '0xaabbccddeeff', 	col varchar varchar not null default 'this is a variable length string' ); create a table for real time analysis this example assumes you plan to ingest large quantities of time series data for rapid querying the most important columns that you plan to reference in queries frequently include created at — a granular timestamp column that represents the primary time reference in the table user id — an integer column to identify each unique user sell id — an integer column representing each transaction by a user create table if not exists "transact data" ( "created at" timestamp time key bucket(1, hour) not null, "user id" bigint not null default 0, "sell id" bigint not null, "purchase amount" double not null default 0, "buyer name" varchar, clustering key "ck" ("sell id", "user id") ) create index "purchase idx" on "transact data" ("purchase amount"); create index "buyer idx" on "transact data" ("buyer name") using ngram(4); this example table makes use of these ocient system configurations column name configuration description created at this column is the table {{timekey}} , which means that the database indexes for filter optimization the example configures the timekey to partition segments by every hour, (1, hour) , which is the approximate time granularity you expect to use in your query filters user id , sell id the example table identifies these two columns as the table clustering key the database pairs these two columns for quick reference, such as filtering one column for row values of the other purchase amount the create index sql statement adds a secondary index to this column as a bigint data type, the purchase amount column defaults to the inverted index buyer name this create index sql statement adds a secondary index to this column as a varchar column, the buyer name column normally defaults to a hash index, but this example specifies it should be an ngram index of width 4 instead for details about using the timekey and clustering key functionality, see timekeys and clustering keys docid\ oxu8ziws8csyc5h7l4xxm for details about other index types, see secondary indexes docid\ efbuyqjp 9ole py4jpst create a table for geospatial data this example assumes you plan to ingest geospatial data linked to time series data using {{ocientgeo}} the most important columns that you plan to reference frequently in queries include route time — a granular timestamp column that represents the primary time reference in the table start point , end point — the geospatial point values for the start and destination locations trip id — an integer column to identify each unique trip vehicle id — an integer column that represents each vehicle create table "trip details" ( "route time" timestamp time key bucket(14, day) not null, "start point" point null, "end point" point not null default 'point(0 0)', "route path" linestring not null default 'linestring(0 0, 1 1)', "route area" polygon not null default 'polygon((0 0, 0 1, 1 1, 1 0, 0 0))', "trip id" bigint not null, "vehicle id" bigint not null, clustering key "ck" ("trip id", "vehicle id") ); create index "start idx" on "trip details" ("start point") using spatial; create index "end idx" on "trip details" ("end point") using spatial; this example table makes use of these ocient system configurations column name configuration description route time this column is the table timekey , which means that the database indexes for filter optimization the example configures the timekey to partition segments by every two weeks (14, day) , which is the approximate time granularity that you expect to use in your query filters trip id , vehicle id the example table identifies these two columns as the table clustering key the database pairs these two columns for quick reference, such as filtering one column for row values of the other end point , route path , route area these columns have point , linestring , and polygon data types, respectively each of these geospatial columns includes default values in their respective formats start point , end point the create index sql statements for both of these columns assign spatial indexes for quicker reference this index is the default type for any column with a geospatial data type for details about ocientgeo functionality, see geospatial functions docid\ hpngsvdeojhofhdmpyrmh create a table with compression options this example table includes multiple compression options to ensure minimal storage for specific columns and extra performance most data types except arrays have compression dynamic applied by default create table "call records" ( "utc timestamp" timestamp time key bucket(1, day) not null, "sectorid" varchar compression zstd, "address" varchar compression none, "call type" varchar compression zstd compression gdc(4), "access information" varchar(500) compression gdc(4), "file name" varchar compression zstd compression level = 3, dictionary size = 1048576, clustering key "ck" ("call type", "access information") ) this example table makes use of these ocient system configurations column name configuration description address this column explicitly removes compression by specifying compression none in the create table sql statement call type , access information these columns have varchar data types configured with global dictionary compression (gdc) this compression scheme maps the variable length type rows to integers among other benefits, this compression allows both columns to support a clustering key for faster querying the call type column stacks multiple compression schemes by including both gdc and zstd compression to provide additional storage reduction file name this column uses zstd compression with extra options the compression level is set for 3 , meaning the column data is slightly more compressed than the default 0 the dictionary size is set for the maximum value, 1048576 , which means greater compression but higher memory demand for details about ocient compression schemes, see table compression options docid\ cjq7ia90yenazchewq4r0 create a table for various container types this example table includes complex container data types, including a two dimensional array, a tuple with variable length values, and a tuple with various data types create table "container data" ( "utc timestamp" timestamp time key bucket(1, day) not null, "user id" bigint not null default 0, "sell id" bigint not null default 0, "point array col" point\[], "array col" varchar\[] compression gdc(4), "2d array col" int\[]\[], "tuple col1" tuple<< varchar(255), varchar(500) \>>, "tuple col2" tuple<< int, int\[], varchar compression gdc(1) compression zstd, varchar \>>, "matrix col" matrix\[5]\[10], clustering index "ck" ("user id", "sell id") ) create index "point idx" on "container data" ("point array col") using spatial; create index "tuple idx" on "container data" ("tuple col1"\[1]) using hash; create index "tuple idx2" on "container data" ("tuple col2"\[2]) using inverted; this example table makes use of these ocient system configurations column name configuration description point array col this column is an array of point values that is indexed using a spatial index type as specified by the create index sql statement tuple col1 this tuple column contains two variable length values each object in the tuple can support its own compression scheme and index the create index sql statement assigns a hash index to the first value ( varchar(255) ) tuple col2 this tuple column contains various data types each object in the tuple can support its own compression scheme and index the two varchar objects in the tuple column each have their own unique compression scheme the create index sql statement assigns an inverted index to the second tuple object, the integer array int\[] for details about ocient container types, see array, tuple, and matrix overview docid 9iwora l gsfn upivf8a related links data types docid\ salunscyvkcxpgzh3m0hd database, tables, views, and indexes docid\ a4jvhkvg31tapexr9zpcq table compression options docid\ cjq7ia90yenazchewq4r0 secondary indexes docid\ efbuyqjp 9ole py4jpst