Ocient Hyperscale Data Warehouse Release Notes
Internal updates only.
Bugs
- [DB-28722]: Fixed the issue where the client closed the result set before sending the KILL_QUERY request.
Bugs
- [DB-28599]: Fixed the SSO expiration handling.
- [DB-28603]: Fixed failing CREATE TABLE AS SELECT queries that did not release service class concurrency slots.
Bugs
- [DB-28459]: Fixed improper handling of NULL values during multi-joins spilling their child cursors.
Bugs
- [DB-27799]: Fixed the production of incorrect data when you specify cluster key columns in the table definition that are out of order.
Bugs
- [DB-27317]: Increased efficiency in handling the NOT predicate in combination with another predicate.
- [DB-27214]: Fixed correctness issue that involved the UNNEST function and a GDC join. In this case, the GDC join filtered NULL incorrectly using the NULL_INPUT clause in the UNNEST function.
Bugs
- [DB-27189]: Fixed cache invalidation using the ALTER TABLE INVALIDATE CACHE command for the invalidation of the cache on all SQL nodes.
Bugs
- [DB-25262]: Segment Scheduling Balance — Increased the granularity and uniformity of work scheduling in the I/O layer to increase parallelism and reduce skew in completion time across operator instances.
- [DB-25998]: Page Data Integrity — Fixed code to avoid duplicate data in pages when Loader Nodes experience an out-of-memory condition.
- [LAT-1777]: Load of Large Strings and Binary Data — Added support for loading large strings and binary data to the maximum internal limit (512 MiB) of the Ocient system.
Features
- [DB-21609]: Machine Learning Model Updates —
Version Compatibility
- The database data control language (DCL) denotes user role privileges to remove data using the DELETE keyword instead of TRUNCATE.
Bugs
- [DB-25903]: Improved query plan for long AND and OR filters with a duplicate value.
- [LAT-1846]: Fixed state file issue that prevented the start of the load after the restart of continuous file loading.
Release Highlights
- All machine learning functionality is available to use. For details, see Machine Learning Model Functions and Machine Learning in Ocient to get started.
- Delete Syntax: Enabled the deletion of individual rows in the database.
- Integrations: Added drivers and support for the following third-party applications:
- DBeaver
-
Features
- [DB-13607]: Delete Syntax — Added the SQL DELETE statement syntax that enables the deletion of individual rows in the database. For details, see DELETE FROM TABLE.
- [DB-18020]: Large Geospatial Types — Increased the size of LINESTRING and POLYGON geospatial data types to 512 MB. For details, see Load Geospatial Data.
- [DB-19048]: Geospatial Index — Added the SPATIAL index type for indexing geospatial data. For details, see SPATIAL Index Type.
- [DB-18280]: Connectors Refresh — Added integration with DBeaver and . For details, see DBeaver Integration and Tableau Integration.
- [DB-20412]: Multi-Cluster Loading and Cluster of Clusters — Added support for loading and working with multiple clusters. For details, see Multiple Storage Clusters for Loading Data.
Bugs
- [DB-22545]: Updated syntax error with a more descriptive error message when you use the list type in an incorrect context.
- [DB-23406]: Fixed the processing of DNF predicates to be more efficient.
- [DB-23452]: Fixed syntax error during SSO integration.
- [DB-25237]: Introduced a cache for rebuilt data to optimize query performance when one of the foundation nodes is unavailable.
Version Compatibility
- Large Geospatial Types are not backwards compatible with earlier releases. For details, see Version Compatibility.
Bugs
- [DB-23943]: Improved mixed priority concurrent performance.
- [DB-23284]: Logging has been updated when the System redirects a query to another node. Now, logging includes query::start followed by query::redirect with the same query identifier. For other operations, only the query::redirect entry remains. Either way, the sys.completed_queries system catalog table does not contain a corresponding entry.
- [DB-19888]: Machine Learning Model Updates —
- The Ocient System scopes machine learning models to schemas. The system assigns the pre_v22_mlmodel schema to any model you created prior to version 22.0.
- The sys.multiple_linear_regression_slopes system catalog table has been removed.
- Rename machine learning models using ALTER MLMODEL.
- New DDL commands:
- CREATE OR REPLACE
- REFRESH
- EXPORT
Release Highlights
- HyperLogLog (HLL): Added HLL sketch functionality.
- Information Schema: Added the information_schema schema that shows system metadata.
- Integrations: Added drivers and support for the following third-party applications:
-
- SQLAlchemy
- Superset®
Features
- [DB-13603]: Information Schema — Added the information_schema schema that shows system metadata in an accessible format.
- [DB-20484]: Superset Integration — Merged sqlalchemy-ocient driver into Superset repository, allowing Superset to support database connections.
- [DB-20484]: SQLAlchemy Integration — Published sqlalchemy-ocient driver to PyPI.
- [DB-21011]: EXCEPT Clause — Added EXCEPT clause so that SELECT * queries can explicitly omit columns from results.
- [DB-21769]: Metabase Integration — Added Ocient as a Metabase partner driver, allowing Metabase to access Ocient databases out-of-the-box.
- [DB-23030]: JDBC Packaging — Removed OpenJump dependency from ocient-jdbc4.
- [DB-23175]: Time Zone Adjustment Support — Added various improvements to time zone functionality, including:
- Support for daylight savings adjustment based on time zone.
- Added time zone functions CONVERT_UTC_TIMESTAMP_TO LOCAL and CONVERT_LOCAL_TIMESTAMP_TO_UTC. For more information, see Time Zone Functions.
- Enhanced performance for time zone conversion.
- [DB-23177]: Push-Down Aggregation to the I/O Layer — Under certain conditions, the system pushes aggregation to the I/O operator for better efficiency and performance.
- [DB-23299]: HLL Sketch Functionality — Added support for variable log2k HLL sketch algorithm and associated functions. For details, see the HLL Functions page.
- [DB-23745]: Implement Evacuate Node — Evacuate node is a tool to move all segments off of a node in a system that is overprovisioned to the other nodes in the cluster. This tool is useful when you replace drives or a node.
LAT Features
- [LAT-1469]: Manual Configuration of LAT Endpoints — Enabled manual configuration of LAT endpoints for OAuth with .
- [LAT-1475]: Enablement of Stopping Load Processing During Error Condition — Added default behavior to stop processing during file loading in the event of an unrecoverable error when the system extracts records from a file. For details, see continue_on_unrecoverable_error.
- [LAT-1476]: Enablement of LAT Service in Installation — Enabled LAT Service in systemd by default upon installation completion. This update reflects a change in the default behavior during installation.
- [LAT-1477]: LAT Version for Metrics — Exposed lat_version in the metrics.
- [LAT-1557]: Support for Loading Multiple S3 Buckets — Added LAT functionality to load data from multiple S3 buckets simultaneously within the same pipeline.
Bugs
- [DB-18002]: Improved resource sharing for concurrent queries.
- [DB-21658]: Improved Raft Leadership stability. Raft Leaders are more resilient to slow or hung followers.
- [DB-21699]: Fixed a query internal error during storage snapshot synchronization operations.
- [DB-22077]: Improved Raft robustness for the administrator node network instability.
- [DB-22788]: Improved validation of duplicate CREATE TABLE AS SELECT operations.
- [DB-22885]: Reduced resource usage during storage state transitions.
- [DB-22927]: Fixed optimization time on queries with large UNION ALL SQL statement on tables with many columns.
- [DB-23022]: Fixed an issue that caused the LIST VIEW command to erroneously not return VIEWS that were granted to a user or group.
- [DB-23529]: Enabled decimal construction from fractional values without a leading 0.
- [DB-23572]: Increased the security token expiration time to meet the standard for the driver.
Version Compatibility
- Information Schema — Views created prior to Version 22.0 do not have column data appearing in the information_schema. You can drop and recreate these views to populate column data.
- LAT — Version 3.0.0 and greater is only compatible with Version 22.0 and greater of the Ocient system. For details, see Version Compatibility.
Bugs
- [DB-22718]: Improved network efficiency on queries with high result set volume.
- [DB-23089]: Improved statistic update heuristics on large tables.
- [DB-23621]: Fixed resource leak issue on threads with TRUNCATE TABLE SQL statement that could cause intermittent long optimization times.
Bugs
- [DB-22957]: Fixed incorrect service class concurrency tracking on query errors.
Release Highlights
The Ocient System now supports the following operating systems:
- Ubuntu® 20.04
- Debian 11
- RHEL 8
Other highlights include:
- Whole column compression: Added Zstandard (ZSTD) compression for fixed and variable length columns.
- Check system configuration: Added precheck and postcheck commands to check system configuration before and after installation.
- Workload management dynamic priority: Enabled the adjustment of the query priority dynamically at the session, service class, and query levels.
- Ability to quiesce node: Added process for graceful node shutdown.
Bugs
- [DB-22312]: Fixed connection lock contention causing query problems.
- [DB-22196]: Fixed bug causing queries to run under default service class despite explicit assignment.
- [DB-21736]: Service class data is now included in query optimization.
- [DB-21078]: Added REST endpoint to expose query and performance metrics in JDBC CLI. Added ms precision to JDBC log entries. Added s3_upload_part_size and s3_upload_part_parallelism to JDBC Data Extract properties.
- [DB-20879]: Fixed bug for queries that exceed their max_elapsed_time but not the max_elapsed_time_for_caching. Now queries will complete and store their results in the result cache even if the client drops its connection.
- [DB-20148]: Fixed bug when first element in an array of tuples is null.
- [DB-19915]: Added better Raft handling for network errors.
- [DB-19707]: Fixed internal error with posOfParentInChild being invalid.
- [DB-22675]: Fixed vector_max function handling of NULL values.
Features
- [DB-18636]: ZSTD Compression - Added a new whole-column compression scheme (ZSTD) that can be enabled for fixed and variable length columns.
- [DB-18990]: Improved Stats Storage And Usage - Various improvements have been added to speed up the fetching of statistics by the optimizer and ensure it gets up-to-date statistics. These changes primarily center around probability density functions being stored as pre-aggregated stats files instead of on a per-segment basis.
- [DB-20190]: Distributed Tasks - Added check_disk task type and new vtables sys.subtasks, sys.tasks, and sys.rebuild_tasks for monitoring tasks. Remove CHECK DATA command.
- [DB-19117]: Metadata - Added participating_nodes to the sys.queries and sys.completed_queries virtual tables.
- [DB-18633]: Graceful Node Shutdown - Added quiesce process for graceful node shutdown.
- [DB-18061]: LCK Deprecation - Added new disk data format that is smaller and also improves performance of some index based queries.
- [DB-19414]: Range Query Improvement - Improved performance of range queries by utilizing the inverted secondary index.
- [DB-20168]: Geospatial Function Expansion - Added these geospatial scalar functions.
- Measurement Functions
- ST_ANGLE
- ST_DISTANCESPHERE
- ST_DISTANCESPHEROID
- ST_LENGTH2D
- ST_HAUSDORFFDISTANCE
- Analytic and Property Functions
- ST_DIMENSION
- ST_GEOHASH
- ST_SRID
- ST_ISPOLYGONCW
- ST_ISPOLYGONCCW
- To String and Binary Functions
- ST_ASWKT
- ST_ASWKB
- ST_ASEWKT
- Geography Simplification Function
- ST_SIMPLIFY
- Constructor Functions
- ST_POINTFROMGEOHASH
- ST_GEOGPOINT
- ST_MAKEPOLYGONORIENTED
- ST_POINT_FROMEWKT
- ST_LINESTRING_FROMEWKT
- ST_POLYGON_FROMEWKT
- ST_MAKEENVELOPE
- Additionally, you can construct ST_POLYGON types directly from a POINT[] without going through an intermediate ST_LINESTRING.
Keywords
Added these new keywords as reserved words in the Ocient system.
- ANALYSIS
- AUTOREGRESSION
- BAYES
- CANCEL
- COMPONENT
- DECISION
- DISABLE
- DISABLE_STATS_FILE_UPDATES
- ENABLE
- FEEDFORWARD
- INSERT
- KMEANS
- KNN
- LOGISTIC
- MACHINE
- MOVE
- NAIVE
- NETWORK
- NONLINEAR
- PRINCIPAL
- REPLACE
- SOURCE
- SUPPORT
- TREE
- VECTOR
- ZSTD
Release Highlights
- CREATE TABLE AS SELECT SQL Statement: Extract, load, and transform (ELT) workflow functionality to extract data and load it into a new database table by using the query results from a SELECT SQL statement. The tables you create using the CREATE TABLE AS SELECT SQL statement have some indexing limitations in version 20.0. For details, see the "About Create Table As Select (CTAS)" section of the Ocient user documentation.
- INSERT INTO SQL Statement: ELT workflow functionality to extract data and insert it into an existing database table using the INSERT INTO SQL statement.
- N-gram Indexes: Full index on VARCHAR, VARCHAR arrays, and VARCHAR tuple components for efficient queries using the LIKE SQL statement.
- Large VARCHAR [DB-16142]: Support VARCHAR columns up to 1GB in size.
- Ocient Simulator: An instance of the Ocient system for data loading and functional testing.
- Single Sign-On (SSO): Authenticate access to Ocient through an external SSO server and assign SSO users to groups in Ocient.
Bug Fixes
- [DB-20942]: Improved handling of connection failures caused by network errors during Raft actions
- [DB-20657]: Increased maximum number of available threads to improve handling of high concurrency
- [DB-20249]: Upgraded jemalloc to address a virtual memory accounting issue that manifested as a heap memory leak
- [DB-20229]: Fix incorrect results in query involving anti join and grouped aggregation
- [DB-19987]: Fixed bug in left join edge case
- [DB-19854]: Added username in security log file for failed connection
- [DB-18955]: Added timestamp_start, time_start, timestamp_execstart and time_execstart to the sys.queries virtual table.
Feature Removal
The ALTER ROLE DDL command has been removed. You can make all changes using the ALTER CONFIG SQL statement. To alter a role, prefix the key with the role name followed by a dot.
The following system tables have been added:
- average_bb_sizes
- linear_combination_regression_models
- node_config
- node_status
- sso_connections
- storage_device_status
The following system tables have been removed:
- hugepage_configurations
- memory_module_models
- node_memory_modules
- oidc_integrations
- oidc_sessions
- polynomial_regression_models
- security_integrations
- sessions
- [DB-20594] - Update cost estimations to be more accurate in order to create more optimal query plans
- [DB-20085] - Improve runtime performance of queries with multiple predicates referencing secondary indices
- [DB-20019] - Add logic to escape thread pool exhaustion scenario on pool in charge of processing commands
- [DB-19916] - Add missing support for st_linestring, st_polygon, and matrix type comparisons during query execution
- [DB-19887] - Fix crash that occurred when SQL node could not communicate with admin on startup
- [DB-19881] - Add logging/alerting for when system configuration refreshes are hanging due to transient network issues
- [DB-19662] - Fix an issue where GDC caches could become stale for extended periods of time
- [DB-14527] - Adaptive Water Mark Feature - Indexer Node dynamically increase and reduce batch size without manual tuning
- [DB-14656] - Added a rest endpoint to expose a node’s configuration parameters (:9090/v1/configparams)
- [DB-15123] - Expose cluster total storage space and storage usage through virtual tables
- [DB-15515] - Add support for expr::dtype cast notation
- [DB-16289] - Remove the web ui and YAML service role configuration
- [DB-16904] - Allow any predicate type to be used in conjunction with the values in arrays
- [DB-17600] - Fix for incorrect handling of interval types in predicates
- [DB-17595] - Improve error messaging when an invalid type name is encountered
- [DB-17656] - Address query performance using a join on an array column
- [DB-17797] - Optimization to child-sensitive operators to improve query performance
- [DB-17889] - Improve ability to continue data loading when a foundation node is down
- [DB-17986] - Fix for an internal error encountered on count distinct used with unnest of an array
- [DB-18165] - Improve performance for queries that unnest an array
- [DB-18174] - Workload management limits were added to virtual tables that interact with the storage cluster
- [DB-18211] - Allow optimizer to combine some array predicates when ordering varies
- [DB-18393] - Leverage hyperthreading in query execution
- [DB-18486] - Fix correctness issues for array_agg() and string_agg() query plans
- [DB-18872] - Improve handling for invalid table stats
- [DB-18914] - Improve performance of right semi joins
- [DB-18991] - Scheduler enhancements to improve handling of low memory conditions
- [DB-19307] - Add enhanced error handling while transforming query plans during optimization
- [DB-19806] - Fix SYSTEM database for admins and qualified user name in sys.queries
- In v19 the service role configuration previously set through the web UI has been replaced by the ALTER … ALTER ROLE/CONFIG … DDL command. The web UI is still available in v19, but will be removed in a subsequent release. The ALTER … ALTER ROLE/CONFIG … command should be used to change system configuration, rather than the web UI.Please reference the Upgrade Ocient Software section of the user documentation for details.
- [DB-12747] - Add support for lateral joins.
- [DB-13924] - Add support for multi-column subqueries.
- [DB-14990] - Add support for native right joins.
- [DB-15996] - Add support for array_to_string function.
- [DB-16231] - Improve GIS function performance and introduce expanded support for GIS functions. Please refer to the User Documentation for details.
- [DB-17037] - Add new scalar functions and operators added for GIS types (POINT, LINESTRING, and POLYGON). Please refer to the User Documentation for details.
- [DB-17892] - Add support for right lateral joins.
- [DB-16061] - Secondary indexes can now be created on VARCHAR and VARCHAR[] columns. Please refer to the User Documentation for details
- [DB-19420] - Allow PSO threshold to exceed 24 hours
- [DB-17915] - Fix for JDBC Driver issue reconnecting on external endpoint, uses internal IP for subsequent queries
- [DB-17851] - Fix for ST_POINT normalization taking too long
- [DB-17635] - Remove query log properties timestamp_optimizationcomplete and time_optimizationcomplete and add new properties timestamp_optimizationstart and time_optimizationstart
- [DB-17386] - Fix for ‘day_part’ queries resulting in internal error
- [DB-17375] - Fix for queries variabilty during Stats aggregation in the SQL node.
- [DB-17362] - Fix for queries that can run out of memory in the extend operator
- [DB-17252] - Fix for Low_priority service class holding onto slots
- [DB-17174] - Fix for corner case where a long running query can cause a crash in the SQL node if the JDBC disconnects during its execution.
- [DB-17167] - Fix for round() function not supporting lowercase time interval
- [DB-17124] - Column predicate NULL behavior doesn’t match Postgres
- [DB-16944] - Fix for a query finalization condition where the CPU and Memory resources can be not available for Future queries
- [DB-16830] - Fix for DATE_PART alias not working correctly (PG Compatibility)
- [DB-16708] - Fix cost estimates/ plan optimization for extend operators
- [DB-16567] - Make error messages more clear for queries with GroupBy missing
- Features
- [DB-17316] - Change array_length(empty array) to return 0
- [DB-16417] - Allow for integral types for integer field is GIS functions
- [DB-16200] - Make Explains more convenient for the user.
- [DB-16092] - Distributed Result Set Caching
- [DB-15623] - Add support for rebuilding individual nodes via DDL
- [DB-15375] - ALTER CLUSTER ADD PARTICIPANTS DDL
- [DB-14720] - Provide a way to kill long running optimizations
- [DB-14017] - Support for CLI command history across sessions
Features
- [DB-12888] - Add support for array values larger than 128 KB. The new maximum value of an array is 512 MB.
Features
- [DB-10329] - Add support for full disk encryption of Opal drives. Disk encryption will be automatically enabled when Opal support is detected.
- [DB-14159] - Default hex values for binary or varbinary columns must contain a leading 0x
- [DB-14330] - Remove last dependencies on PostgreSQL from the database
Features
- [DB-13334] - Add support for zip unnest, which unnests multiple arrays in parallel
Features
- [DB-12887] - Add support for the array of tuples. Users can create array columns containing tuple SQL types. Please refer to the User Documentation for the latest information on supported data types.
- [DB-12885] - Add support for unnest(), which expands array elements from input array columns out to individual output rows
Features
- [DB-12394] - Added support for running on CentOS 8.
Features
- [DB-13162] - Added support for Tasks to the System Catalog
- [DB-10332] - Implemented access controls on system and database-level objects. Improved users, groups, and added new roles within Ocient.
- [DB-12829] - Optionally enforce encrypted connections for JDBC and ODBC.
Bug Fixes
- [DB-13133] - Fix bug in Bulk Loader serialization of non-nullable GDC array columns.
Features
- [DB-10330] - External Network Security. SSL/TLS support in ODBC and JDBC, SSL support for the web interface.
Bug Fixes
- [DB-12580] - Fix bug in Export table and Describe table to now work properly on array columns with GDC compression.
- [DB-12840] - Prevent Bulk Loader out-of-memory when loading wide or unclustered tables.
Features
- [DB-10921] - Adds support for multi-dimensional arrays and the ability to do joins, windows, sorts and aggregations that involve arrays.
- [DB-10927] - Adds support for global dictionary compression (GDC) on VARCHAR array columns and the ability to do replacement joins.
- [DB-10282] - Adds support for DROP COLUMN DDL to remove columns from a table.
- [DB-11472] - Adds support for skipping failed rows for CSV loading up to some specified threshold.
Bug Fixes
- [DB-12440] - The logrotate config file (/etc/logrotate.d/rolehostd) has been updated to include the line dateformat -%Y%m%d-%H. Without this change, intra-day log rotation due to the maximum log size caused the backed up log names to conflict. For systems installed prior to 6.2, this line should be added to the /etc/logrotate.d/rolehostd file.
Bug Fixes
- Minor stability fixes.
Features
- [DB-9707] - Scriptable Bulk Load Essentials: Allows users to create translations and launch bulk load tasks via DDL
- [DB-10479] - Adds support for Tableau through Ocient’s JDBC Custom Connector. Users can find Ocient’s connector and the installation instructions on Tableau’s extension gallery. Please refer to Tableau for more inforamation.
Features
- [DB-9477] - Adds support for the array data type. Users can create single-dimensional array columns from any other supported data type. Please refer to the User Documentation for the latest information on supported data types.
- [DB-9656] - Add column support. The engine now supports the add column DDL statement with the ability to add columns to an existing table. Existing data that was loaded without the new column uses the configured default values when queried. Please refer to User Documentation for information on the DDL syntax and default values.
Bug Fixes
- [DB-11173] - Properly handle HDFS partial file transfers in loader.
- [DB-11283] - Fix bug in loader that could cause degraded performance or hang during load.
Bug Fixes
- Address performance regression for some queries.
Bug Fixes
- [DB-10765] - Better handling for units in st_distance function.
- [DB-10803] - Handle S3 bucket region redirects.
- [DB-10801] - Increase loader node huge page count to avoid OOM during compressed data load.
Bug Fixes
- [DB-9465] - Handle bulk load cases where merging subsets of data causes data size to increase dramatically.
Features
- [DB-6386] - Availability of the storage engine, allowing queries to run with a node or drive failure
- [DB-6588] - Bulk loading of CSV files from HDFS or an S3 endpoint
- [DB-7221] - Delta compression in the TKT engine for timestamp columns
- [DB-7623] - Virtual tables to retrieve information from the storage cluster state
- [DB-7247] - OS Upgrade functionality
- [DB-6125] - AWS initial support
- [DB-6383] - Data Definition Language (DDL) operations
- [DB-6940] - All system configuration in the System Catalog
- [DB-6362] - Stats Virtual Tables
- [DB-7098] - External Window Operator support
- [DB-7497] - List Running Tasks Page
- [DB-7097] - Segment Group Deletion
- [DB-7139] - Cancel Query and Cancel Task support
- Numerous stability and performance improvements