Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.ocient.com/llms.txt

Use this file to discover all available pages before exploring further.

The has aggregated and sorted aggregate functions. Aggregate and sorted aggregate functions compute a single result from a group of rows. You can use the DISTINCT keyword with any aggregation function, such as COUNT(DISTINCT col). Whereas sorted aggregate functions use the standard ORDER BY syntax to dictate the ordering of the aggregation. To sort the elements of an array, use the ARRAY_SORT function during data load.

Aggregate Functions

Supported input types vary by function. In general, numeric types include TINYINT, SMALLINT, INT, BIGINT, FLOAT, DOUBLE, and DECIMAL. Various examples on this page use these tables.
SQL
CREATE TABLE sample_data (val INT, label VARCHAR(10));
INSERT INTO sample_data (val, label) VALUES
    (2, 'alpha'), (4, 'beta'), (4, 'alpha'), (5, NULL),
    (7, 'gamma'), (8, 'alpha'), (12, 'beta'), (15, 'gamma');
SQL
CREATE TABLE predictions (actual INT, predicted INT);
INSERT INTO predictions (actual, predicted) VALUES
    (10, 12), (20, 18), (30, 33),
    (40, 37), (50, 52), (60, 58);
For simple functions, examples use the sys.dummy virtual table. For details, see Generate Tables Using sys.dummy.

ACCURACY_SCORE

Returns the fraction of predictions that match the actual class labels. This function calculates the ratio of correctly predicted rows to the total number of rows.
SQL
ACCURACY_SCORE(y, y_hat)
ArgumentData TypeDescription
yAll types supportedThe actual class label.
y_hatAll types supportedThe predicted class label.
The return type is DOUBLE. The result represents the proportion of correct predictions, ranging from 0 to 1. Examples Calculate the Accuracy Score for Matched Values This query returns the accuracy score when all predictions match the actual values.
SQL
SELECT ACCURACY_SCORE(c1, c1) FROM sys.dummy10;
Output: 1 Calculate the Accuracy Score with Comparison Against Squared Values This query returns the accuracy score when comparing actual values against their squared values.
SQL
SELECT ACCURACY_SCORE(c1, c1 * c1) FROM sys.dummy10;
Output: 0.1

ANY_VALUE

Returns an arbitrary non-NULL value from the input column. The function returns NULL only if all rows in the column are NULL. The ANY_VALUE function does not support window aggregation. Syntax
SQL
ANY_VALUE(col)
ArgumentData TypeDescription
colAny typeThe column from which to return an arbitrary non-NULL value.
Example
SQL
SELECT
    ANY_VALUE(label) AS sample_label
FROM sample_data;
Output: alpha
The returned value is non-deterministic. The database can return any non-NULL value from the column.

APPROX_COUNT_DISTINCT

Returns an approximate count of distinct values in the column using the HyperLogLog algorithm, with a 95% confidence interval that the result is within 4.5% of the exact count. Syntax
SQL
APPROX_COUNT_DISTINCT(col)
ArgumentData TypeDescription
colAny non-interval typeThe column for the approximation of the distinct count.
Example
SQL
SELECT
    APPROX_COUNT_DISTINCT(val) AS approx_distinct
FROM sample_data;
Output: 7 The sample_data table has 8 rows, but there are only 7 distinct values in the val column (the value 4 appears twice).

APPROX_SUM

Computes a sum using a faster, non-deterministic ordering for floating-point columns. This summation can lead to minor differences in the result on the order of the machine epsilon. For integral column types, the function uses the standard SUM algorithm. Syntax
SQL
APPROX_SUM(col)
ArgumentData TypeDescription
colFLOAT, DOUBLE, or any integral typeThe column to sum.
For integral inputs, the APPROX_SUM function uses the standard SUM algorithm and returns the same result. The performance benefit applies to large floating-point data sets.
Example
SQL
SELECT
    APPROX_SUM(c1) AS approx_total
FROM sys.dummy5;
Output: 15

AVG

Computes the arithmetic mean over the set of values. Syntax
SQL
AVG(col)
ArgumentData TypeDescription
colAny numeric type or MATRIXThe column to average.

Integral and FLOAT inputs return DOUBLE.

DOUBLE, DECIMAL, and MATRIX inputs preserve their type.
Example
SQL
SELECT
    AVG(c1) AS avg_val
FROM sys.dummy5;
Output: 3.0

COEFFICIENT_OF_DETERMINATION

Computes the coefficient of determination (R²) between actual and predicted values. Primarily used to evaluate the performance of machine learning regression models, R² measures the proportion of variance in the actual values that the predicted values explain. A value of 1.0 indicates a perfect fit, while 0.0 indicates that the predictions explain none of the variance. This function returns NULL when the total sum of squares is zero (i.e., all actual values are identical). Syntax
SQL
COEFFICIENT_OF_DETERMINATION(actual, predicted)
ArgumentData TypeDescription
actualAny numeric typeThe column of observed values.
predictedAny numeric typeThe column of predicted values.
Example
SQL
SELECT
    COEFFICIENT_OF_DETERMINATION(actual, predicted) AS r_squared
FROM predictions;
Output: 0.980571428571429

CONFUSION_MATRIX

Returns a structured representation of the counts for every combination of actual and predicted class labels relative to a specified positive class. This function aggregates predictions into a 2x2 matrix in the format [[TP, FP], [FN, TN]] for binary classification analysis.
SQL
CONFUSION_MATRIX(y, y_hat, positive_class)
ArgumentData TypeDescription
yAll types supportedThe actual class label.
y_hatAll types supportedThe predicted class label.
positive_classAll types supportedThe class label to treat as the positive class. Must match the type of y and y_hat.
The return type is TUPLE. Each entry in the result contains the count of rows for a specific combination of actual and predicted class values, organized as true positives, false positives, false negatives, and true negatives. Examples Calculate the Confusion Matrix with Matched Values This query returns the confusion matrix when all predictions match the actual values.
SQL
SELECT CONFUSION_MATRIX(
    CASE WHEN c1 > 5 THEN 1 ELSE 0 END,
    CASE WHEN c1 > 5 THEN 1 ELSE 0 END,
    1
) FROM sys.dummy10;
Output: [[5, 0], [0, 5]] Calculate the Confusion Matrix with Mismatched Classifiers This query returns the confusion matrix for two mismatched binary classifiers, where actual positives are rows with c1 > 5 and predicted positives are rows with c1 < 3.
SQL
SELECT CONFUSION_MATRIX(
    CASE WHEN c1 > 5 THEN 1 ELSE 0 END,
    CASE WHEN c1 < 3 THEN 1 ELSE 0 END,
    1
) FROM sys.dummy10;
Output: [[0, 2], [5, 3]]

CORR

Alias for CORRELATIONP.

CORRELATION

Alias for CORRELATIONP.

CORRELATIONP

Computes the population Pearson correlation coefficient between two columns. Returns DOUBLE, or DECIMAL if both inputs are DECIMAL-compatible. Alias for CORRELATION and CORR. Syntax
SQL
CORRELATIONP(col1, col2)
ArgumentData TypeDescription
col1Any numeric typeThe first column.
col2Any numeric typeThe second column.
Example
SQL
SELECT
    CORRELATIONP(actual, predicted) AS corr_val
FROM predictions;
Output: 0.9904654955172436

COUNT

Returns the number of rows in the set where values in col are not NULL. When you use the COUNT(*) SQL statement, the function returns the total number of rows. The return type is BIGINT. Syntax
SQL
COUNT(col)
ArgumentData TypeDescription
colAny non-interval typeThe column for counting non-NULL values.

Use * to count all rows instead.
Examples Count All Rows
SQL
SELECT COUNT(*) AS total_rows FROM sample_data;
Output: 8 Count Non-NULL Values The label column has one NULL row.
SQL
SELECT COUNT(label) AS rows_with_label FROM sample_data;
Output: 7

COVAR_POP

Alias for COVARIANCEP.

COVAR_SAMP

Alias for COVARIANCE.

COVARIANCE

Computes the sample covariance between two columns. Returns DOUBLE by default, or DECIMAL if both inputs are DECIMAL-compatible. MATRIX inputs preserve their type. Alias for COVAR_SAMP. Syntax
SQL
COVARIANCE(col1, col2)
ArgumentData TypeDescription
col1Any numeric type or MATRIX (square)The first column.
col2Any numeric type or MATRIX (square)The second column.
Matrix arguments must have matching dimensions.
Example
SQL
SELECT
    COVARIANCE(actual, predicted) AS cov_samp
FROM predictions;
Output: 336.0

COVARIANCEP

Computes the population covariance between two columns. Returns DOUBLE by default, or DECIMAL if both inputs are compatible with DECIMAL. MATRIX inputs preserve their type. Alias for COVAR_POP. Syntax
SQL
COVARIANCEP(col1, col2)
ArgumentData TypeDescription
col1Any numeric type or MATRIX (square)The first column.
col2Any numeric type or MATRIX (square)The second column.
Matrix arguments must have matching dimensions.
Example
SQL
SELECT
    COVARIANCEP(actual, predicted) AS cov_pop
FROM predictions;
Output: 280.0

F1_SCORE

Returns the harmonic mean of precision and recall for a specified positive class. This function calculates 2 * (Precision * Recall) / (Precision + Recall) and provides a single score that balances both precision and recall. The function returns NULL when both precision and recall are 0.
SQL
F1_SCORE(y, y_hat, positive_class)
ArgumentData TypeDescription
yAll types supportedThe actual class label.
y_hatAll types supportedThe predicted class label.
positive_classAll types supportedThe class label to treat as the positive class. Must match the type of y and y_hat.
The return type is DOUBLE. The result ranges from 0 to 1, where 1 represents perfect precision and recall. Examples Calculate the F1 Score with Matching Values This query returns the F1 score when all predictions match the actual values.
SQL
SELECT F1_SCORE(
    CASE WHEN c1 > 5 THEN 1 ELSE 0 END,
    CASE WHEN c1 > 5 THEN 1 ELSE 0 END,
    1
) FROM sys.dummy10;
Output: 1 Calculate the F1 Score for Mismatched Classifiers This query returns NULL because both precision and recall are 0 for the mismatched classifiers.
SQL
SELECT F1_SCORE(
    CASE WHEN c1 > 5 THEN 1 ELSE 0 END,
    CASE WHEN c1 < 3 THEN 1 ELSE 0 END,
    1
) FROM sys.dummy10;
Output: NULL

KURTOSIS

Computes the sample excess kurtosis over the set of values. Kurtosis measures the difference between the tails of a distribution and the tails of a normal distribution. Returns DOUBLE, or DECIMAL if the input is DECIMAL. Syntax
SQL
KURTOSIS(col)
ArgumentData TypeDescription
colAny numeric typeThe column for computing the sample kurtosis.
Example
SQL
SELECT
    KURTOSIS(val) AS kurt_val
FROM sample_data;
Output: 4.763725276787391

KURTOSISP

Computes the population kurtosis over the set of values. Returns DOUBLE, or DECIMAL if the input is DECIMAL. Syntax
SQL
KURTOSISP(col)
ArgumentData TypeDescription
colAny numeric typeThe column for computing population kurtosis.
Example
SQL
SELECT
    KURTOSISP(val) AS kurtp_val
FROM sample_data;
Output: 2.268440607993995

MAX

Returns the maximum value in the specified column. The return type matches the input. Syntax
SQL
MAX(col)
ArgumentData TypeDescription
colAny non-interval typeThe column for returning the maximum value.
Example
SQL
SELECT
    MAX(c1) AS max_val
FROM sys.dummy5;
Output: 5

MEAN_ABSOLUTE_ERROR

Returns the mean absolute error (MAE) between actual and predicted values. This function calculates the average of the absolute differences between each pair of actual and predicted values, treating all errors equally regardless of direction.
SQL
MEAN_ABSOLUTE_ERROR(y, y_hat)
ArgumentData TypeDescription
yNUMERICThe actual value.
y_hatNUMERICThe predicted value.
The return type is DOUBLE. The result is always zero or positive. A value of 0 indicates perfect predictions. Examples Calculate the MAE for Matched Values This query returns the mean absolute error when all predictions match the actual values.
SQL
SELECT MEAN_ABSOLUTE_ERROR(c1, c1) FROM sys.dummy10;
Output: 0 Calculate the MAE with Comparison Against Squared Values This query returns the mean absolute error when comparing actual values against their squared values.
SQL
SELECT MEAN_ABSOLUTE_ERROR(c1, c1 * c1) FROM sys.dummy10;
Output: 33.0

MEAN_ABSOLUTE_PERCENTAGE_ERROR

Returns the mean absolute percentage error (MAPE) between actual and predicted values. This function calculates the average of the absolute percentage differences using the formula Average( AbsoluteValue( (y - y_hat) / y ) ) * 100, which expresses the error relative to the actual values.
SQL
MEAN_ABSOLUTE_PERCENTAGE_ERROR(y, y_hat)
ArgumentData TypeDescription
yNUMERICThe actual value. Must not be 0, because this value is the denominator in the percentage calculation.
y_hatNUMERICThe predicted value.
The return type is DOUBLE. The result is a percentage value where 0 indicates perfect predictions. Examples Calculate the MAPE for Matched Values This query returns the mean absolute percentage error when all predictions match the actual values.
SQL
SELECT MEAN_ABSOLUTE_PERCENTAGE_ERROR(c1, c1) FROM sys.dummy10;
Output: 0 Calculate the MAPE with Comparison Against Squared Values This query returns the mean absolute percentage error when comparing actual values against their squared values.
SQL
SELECT MEAN_ABSOLUTE_PERCENTAGE_ERROR(c1, c1 * c1)
    FROM sys.dummy10;
Output: 450.0

MIN

Returns the minimum value in the specified column. The return type matches the input. Syntax
SQL
MIN(col)
ArgumentData TypeDescription
colAny non-interval typeThe column for returning the minimum value.
Example
SQL
SELECT
    MIN(c1) AS min_val
FROM sys.dummy5;
Output: 1

PRECISION_SCORE

Returns the precision score for a specified positive class. This function calculates the ratio of true positives to the total number of predicted positives using the formula True Positives / (True Positives + False Positives). Of all the predictions for the specified class, this score measures how many are correct.
SQL
PRECISION_SCORE(y, y_hat, positive_class)
ArgumentData TypeDescription
yAll types supportedThe actual class label.
y_hatAll types supportedThe predicted class label.
positive_classAll types supportedThe class label to treat as the positive class. Must match the type of y and y_hat.
The return type is DOUBLE. The result ranges from 0 to 1, where 1 indicates that every prediction for the positive class is correct. Examples Calculate the Precision Score for Matched Values This query returns the precision score when all predictions match the actual values.
SQL
SELECT PRECISION_SCORE(
    CASE WHEN c1 > 5 THEN 1 ELSE 0 END,
    CASE WHEN c1 > 5 THEN 1 ELSE 0 END,
    1
) FROM sys.dummy10;
Output: 1 Calculate the Precision Score for Mismatched Classifiers This query returns the precision score for two mismatched binary classifiers, where none of the predicted positives are actual positives.
SQL
SELECT PRECISION_SCORE(
    CASE WHEN c1 > 5 THEN 1 ELSE 0 END,
    CASE WHEN c1 < 3 THEN 1 ELSE 0 END,
    1
) FROM sys.dummy10;
Output: 0.0

PRODUCT

Computes the product of all values in the column. Syntax
SQL
PRODUCT(col)
ArgumentData TypeDescription
colFLOAT, DOUBLE, DECIMAL, any integral type, or MATRIX (square)The column to multiply. Integral inputs return BIGINT.
Matrix arguments must have matching dimensions.
Example
SQL
SELECT
    PRODUCT(c1) AS factorial_5
FROM sys.dummy5;
Output: 120

RECALL_SCORE

Returns the recall score for a specified positive class. This function calculates the ratio of true positives to the total number of actual positives using the formula True Positives / (True Positives + False Negatives). Of all the actual instances of the specified class, this score measures how many the model correctly identifies.
SQL
RECALL_SCORE(y, y_hat, positive_class)
ArgumentData TypeDescription
yAll types supportedThe actual class label.
y_hatAll types supportedThe predicted class label.
positive_classAll types supportedThe class label to treat as the positive class. Must match the type of y and y_hat.
The return type is DOUBLE. The result ranges from 0 to 1, where 1 indicates that every actual instance of the positive class is correctly identified. Examples Calculate the Recall Score for Matched Values This query returns the recall score when all predictions match the actual values.
SQL
SELECT RECALL_SCORE(
    CASE WHEN c1 > 5 THEN 1 ELSE 0 END,
    CASE WHEN c1 > 5 THEN 1 ELSE 0 END,
    1
) FROM sys.dummy10;
Output: 1 Calculate the Recall Score for Mismatched Classifiers This query returns the recall score for two mismatched binary classifiers, where none of the actual positives are correctly predicted.
SQL
SELECT RECALL_SCORE(
    CASE WHEN c1 > 5 THEN 1 ELSE 0 END,
    CASE WHEN c1 < 3 THEN 1 ELSE 0 END,
    1
) FROM sys.dummy10;
Output: 0.0

ROC_AUC_SCORE

Returns the area under the receiver operating characteristic (ROC) curve (AUC). This function measures the ability of a binary classification model to distinguish between classes. A score of 1.0 represents a perfect classifier, while a score of 0.5 represents performance no better than random guessing.
SQL
ROC_AUC_SCORE(y_true, y_score, positive_class)
ArgumentData TypeDescription
y_trueBOOLEAN or INTEGERThe true binary class label, where the values represent two classes (for example, 0 or 1).
y_scoreNUMERICThe predicted probability or confidence score for the positive class. A score of 1.0 is a perfect classifier, while 0.5 is no better than random guessing.
positive_classThe class label to treat as the positive class. Must match the type of y_true.
The return type is DOUBLE. The result ranges from 0 to 1. Example This query returns the AUC score for a classifier in which higher probability scores correspond to actual positives, indicating perfect class separation.
SQL
SELECT ROC_AUC_SCORE(
    CASE WHEN c1 > 5 THEN 1 ELSE 0 END,
    DOUBLE(c1) / 10,
    1
) FROM sys.dummy10;
Output: 1.0

SKEW

Computes the sample skewness over the set of values. Skewness measures the asymmetry of a distribution. Returns DOUBLE, or DECIMAL if the input is DECIMAL. Syntax
SQL
SKEW(col)
ArgumentData TypeDescription
colAny numeric typeThe column for computing the sample skewness.
Example
SQL
SELECT
    SKEW(val) AS skew_val
FROM sample_data;
Output: 0.8804164883619399

SKEWP

Computes the population skewness over the set of values. Returns DOUBLE, or DECIMAL if the input is DECIMAL. Syntax
SQL
SKEWP(col)
ArgumentData TypeDescription
colAny numeric typeThe column for computing the population skewness.
Example
SQL
SELECT
    SKEWP(val) AS skewp_val
FROM sample_data;
Output: 0.7059036122393627

STDEV

Computes the sample standard deviation over the set of values. Returns DOUBLE, or DECIMAL if the input is DECIMAL. Alias for STDDEV and STDDEV_SAMP. Syntax
SQL
STDEV(col)
ArgumentData TypeDescription
colAny numeric typeThe column for computing the sample standard deviation.
Example
SQL
SELECT
    STDEV(val) AS stdev_val
FROM sample_data;
Output: 4.421942042651783

STDDEV

Alias for STDEV.

STDDEV_POP

Alias for STDEVP.

STDDEV_SAMP

Alias for STDEV.

STDEVP

Computes the population standard deviation over the set of values. Returns DOUBLE, or DECIMAL if the input is DECIMAL. Alias for STDDEV_POP. Syntax
SQL
STDEVP(col)
ArgumentData TypeDescription
colAny numeric typeThe column for computing the population standard deviation.
Example
SQL
SELECT
    STDEVP(val) AS stdevp_val
FROM sample_data;
Output: 4.136348026943574

SUM

Computes the sum of all values in the column. Syntax
SQL
SUM(col)
ArgumentData TypeDescription
colAny numeric type or MATRIXThe column to sum.

Integral inputs return BIGINT.
d
FLOAT inputs return FLOAT.

DOUBLE, DECIMAL, and MATRIX inputs preserve their type.
Example
SQL
SELECT
    SUM(c1) AS total
FROM sys.dummy5;
Output: 15

VAR_POP

Alias for VARIANCEP.

VAR_SAMP

Alias for VARIANCE.

VARIANCE

Computes the sample variance over the set of values. Returns DOUBLE, or DECIMAL if the input is DECIMAL. MATRIX inputs preserve their type and must be square. Alias for VAR_SAMP. Syntax
SQL
VARIANCE(col)
ArgumentData TypeDescription
colAny numeric type or MATRIX (square)The column for computing the sample variance.
Matrix arguments must have matching dimensions.
Example
SQL
SELECT
    VARIANCE(val) AS var_val
FROM sample_data;
Output: 19.553571428571427

VARIANCEP

Computes the population variance over the set of values. Returns DOUBLE, or DECIMAL if the input is DECIMAL. MATRIX inputs preserve their type and must be square. Alias for VAR_POP. Syntax
SQL
VARIANCEP(col)
ArgumentData TypeDescription
colAny numeric type or MATRIX (square)The column for computing population variance.
Example
SQL
SELECT
    VARIANCEP(val) AS varp_val
FROM sample_data;
Output: 17.109375

Sorted Aggregate Functions

The general syntax for sorted aggregate functions adds the DISTINCT and ORDER BY keywords in the function invocation.
SQL
AGGREGATE([DISTINCT] arg1, arg2, ... [ORDER BY ...])

ARRAY_AGG

Returns an array containing every row from the expression. Syntax
SQL
ARRAY_AGG(expr)
ArgumentData TypeDescription
exprAny numeric typeExpression for aggregation into an array.
Example Aggregate ten rows into an array in descending order.
SQL
SELECT
    ARRAY_AGG(c1 ORDER BY c1 DESC)
FROM sys.dummy10;
Output: [10,9,8,7,6,5,4,3,2,1]

ARRAY_CONCAT_AGG

Returns an array that concatenates arrays across rows. The input argument is a SQL expression. The element type must be consistent across rows. The function ignores NULL inputs and returns NULL only when all inputs are NULL. The DISTINCT keyword removes duplicate arrays. To remove duplicate elements from an array, use the ARRAY_DISTINCT function during load. The ORDER BY syntax controls the order of concatenation across rows. To sort elements in the array, use the ARRAY_SORT function during load. Syntax
SQL
ARRAY_CONCAT_AGG(expr)
ArgumentData TypeDescription
exprSQL expressionExpression for concatenation across rows.
Example Create the data_type_example table with a column that stores arrays of integers.
SQL
CREATE TABLE
    data_type_example (col_int_array INT[] NOT NULL DEFAULT 'INT[0,1,2,3]');
Insert one array with values 1, 2, and 3 into the table.
SQL
INSERT INTO data_type_example SELECT ARRAY[1,2,3];
Insert the second array with values 4, 5, and 6 into the table.
SQL
INSERT INTO data_type_example SELECT ARRAY[4,5,6];
Concatenate the two arrays into one array in ascending order.
SQL
SELECT ARRAY_CONCAT_AGG(col_int_array ORDER BY col_int_array ASC)
FROM data_type_example;
Output: [1,2,3,4,5,6]

STRING_AGG

Returns a string concatenated from every row of the expression. Syntax
SQL
ARRAY_CONCAT_AGG(expr, delimiter)
ArgumentData TypeDescription
exprSQL expressionExpression for concatenation across rows.
delimiterOptional.
Examples Concatenate Rows as a String Concatenate ten rows into a string in descending order. Cast integers into strings using the CHAR casting function.
SQL
SELECT
    STRING_AGG(char(c1) ORDER BY c1 DESC)
FROM sys.dummy10;
Output: "10987654321" Concatenate Rows as a String with a Delimiter Concatenate ten rows into a string in ascending order. Cast integers into strings using the CHAR casting function. Use the | delimiter.
SQL
SELECT
    STRING_AGG(char(c1), '|' ORDER BY c1 ASC)
FROM sys.dummy10;
Output: "1|2|3|4|5|6|7|8|9|10" Math Functions and Operators Query Ocient
Last modified on May 21, 2026