The has aggregated and sorted aggregate functions. Aggregate and sorted aggregate functions compute a single result from a group of rows. You can use theDocumentation Index
Fetch the complete documentation index at: https://docs.ocient.com/llms.txt
Use this file to discover all available pages before exploring further.
DISTINCT keyword with any aggregation function, such as COUNT(DISTINCT col). Whereas sorted aggregate functions use the standard ORDER BY syntax to dictate the ordering of the aggregation. To sort the elements of an array, use the ARRAY_SORT function during data load.
Aggregate Functions
Supported input types vary by function. In general, numeric types includeTINYINT, SMALLINT, INT, BIGINT, FLOAT, DOUBLE, and DECIMAL.
Various examples on this page use these tables.
SQL
SQL
sys.dummy virtual table. For details, see Generate Tables Using sys.dummy.
ACCURACY_SCORE
Returns the fraction of predictions that match the actual class labels. This function calculates the ratio of correctly predicted rows to the total number of rows.SQL
| Argument | Data Type | Description |
|---|---|---|
y | All types supported | The actual class label. |
y_hat | All types supported | The predicted class label. |
DOUBLE. The result represents the proportion of correct predictions, ranging from 0 to 1.
Examples
Calculate the Accuracy Score for Matched Values
This query returns the accuracy score when all predictions match the actual values.
SQL
1
Calculate the Accuracy Score with Comparison Against Squared Values
This query returns the accuracy score when comparing actual values against their squared values.
SQL
0.1
ANY_VALUE
Returns an arbitrary non-NULL value from the input column. The function returns NULL only if all rows in the column are NULL.
The ANY_VALUE function does not support window aggregation.
Syntax
SQL
| Argument | Data Type | Description |
|---|---|---|
col | Any type | The column from which to return an arbitrary non-NULL value. |
SQL
alpha
The returned value is non-deterministic. The database can return any non-
NULL value from the column.APPROX_COUNT_DISTINCT
Returns an approximate count of distinct values in the column using the HyperLogLog algorithm, with a 95% confidence interval that the result is within 4.5% of the exact count. SyntaxSQL
| Argument | Data Type | Description |
|---|---|---|
col | Any non-interval type | The column for the approximation of the distinct count. |
SQL
7
The sample_data table has 8 rows, but there are only 7 distinct values in the val column (the value 4 appears twice).
APPROX_SUM
Computes a sum using a faster, non-deterministic ordering for floating-point columns. This summation can lead to minor differences in the result on the order of the machine epsilon. For integral column types, the function uses the standardSUM algorithm.
Syntax
SQL
| Argument | Data Type | Description |
|---|---|---|
col | FLOAT, DOUBLE, or any integral type | The column to sum. |
For integral inputs, the
APPROX_SUM function uses the standard SUM algorithm and returns the same result. The performance benefit applies to large floating-point data sets.SQL
15
AVG
Computes the arithmetic mean over the set of values. SyntaxSQL
| Argument | Data Type | Description |
|---|---|---|
col | Any numeric type or MATRIX | The column to average. Integral and FLOAT inputs return DOUBLE. DOUBLE, DECIMAL, and MATRIX inputs preserve their type. |
SQL
3.0
COEFFICIENT_OF_DETERMINATION
Computes the coefficient of determination (R²) between actual and predicted values. Primarily used to evaluate the performance of machine learning regression models, R² measures the proportion of variance in the actual values that the predicted values explain. A value of1.0 indicates a perfect fit, while 0.0 indicates that the predictions explain none of the variance.
This function returns NULL when the total sum of squares is zero (i.e., all actual values are identical).
Syntax
SQL
| Argument | Data Type | Description |
|---|---|---|
actual | Any numeric type | The column of observed values. |
predicted | Any numeric type | The column of predicted values. |
SQL
0.980571428571429
CONFUSION_MATRIX
Returns a structured representation of the counts for every combination of actual and predicted class labels relative to a specified positive class. This function aggregates predictions into a 2x2 matrix in the format[[TP, FP], [FN, TN]] for binary classification analysis.
SQL
| Argument | Data Type | Description |
|---|---|---|
y | All types supported | The actual class label. |
y_hat | All types supported | The predicted class label. |
positive_class | All types supported | The class label to treat as the positive class. Must match the type of y and y_hat. |
TUPLE. Each entry in the result contains the count of rows for a specific combination of actual and predicted class values, organized as true positives, false positives, false negatives, and true negatives.
Examples
Calculate the Confusion Matrix with Matched Values
This query returns the confusion matrix when all predictions match the actual values.
SQL
[[5, 0], [0, 5]]
Calculate the Confusion Matrix with Mismatched Classifiers
This query returns the confusion matrix for two mismatched binary classifiers, where actual positives are rows with c1 > 5 and predicted positives are rows with c1 < 3.
SQL
[[0, 2], [5, 3]]
CORR
Alias for CORRELATIONP.CORRELATION
Alias for CORRELATIONP.CORRELATIONP
Computes the population Pearson correlation coefficient between two columns. ReturnsDOUBLE, or DECIMAL if both inputs are DECIMAL-compatible.
Alias for CORRELATION and CORR.
Syntax
SQL
| Argument | Data Type | Description |
|---|---|---|
col1 | Any numeric type | The first column. |
col2 | Any numeric type | The second column. |
SQL
0.9904654955172436
COUNT
Returns the number of rows in the set where values incol are not NULL. When you use the COUNT(*) SQL statement, the function returns the total number of rows. The return type is BIGINT.
Syntax
SQL
| Argument | Data Type | Description |
|---|---|---|
col | Any non-interval type | The column for counting non-NULL values. Use * to count all rows instead. |
SQL
8
Count Non-NULL Values
The label column has one NULL row.
SQL
7
COVAR_POP
Alias for COVARIANCEP.COVAR_SAMP
Alias for COVARIANCE.COVARIANCE
Computes the sample covariance between two columns. ReturnsDOUBLE by default, or DECIMAL if both inputs are DECIMAL-compatible. MATRIX inputs preserve their type.
Alias for COVAR_SAMP.
Syntax
SQL
| Argument | Data Type | Description |
|---|---|---|
col1 | Any numeric type or MATRIX (square) | The first column. |
col2 | Any numeric type or MATRIX (square) | The second column. |
Matrix arguments must have matching dimensions.
SQL
336.0
COVARIANCEP
Computes the population covariance between two columns. ReturnsDOUBLE by default, or DECIMAL if both inputs are compatible with DECIMAL.
MATRIX inputs preserve their type.
Alias for COVAR_POP.
Syntax
SQL
| Argument | Data Type | Description |
|---|---|---|
col1 | Any numeric type or MATRIX (square) | The first column. |
col2 | Any numeric type or MATRIX (square) | The second column. |
Matrix arguments must have matching dimensions.
SQL
280.0
F1_SCORE
Returns the harmonic mean of precision and recall for a specified positive class. This function calculates2 * (Precision * Recall) / (Precision + Recall) and provides a single score that balances both precision and recall. The function returns NULL when both precision and recall are 0.
SQL
| Argument | Data Type | Description |
|---|---|---|
y | All types supported | The actual class label. |
y_hat | All types supported | The predicted class label. |
positive_class | All types supported | The class label to treat as the positive class. Must match the type of y and y_hat. |
DOUBLE. The result ranges from 0 to 1, where 1 represents perfect precision and recall.
Examples
Calculate the F1 Score with Matching Values
This query returns the F1 score when all predictions match the actual values.
SQL
1
Calculate the F1 Score for Mismatched Classifiers
This query returns NULL because both precision and recall are 0 for the mismatched classifiers.
SQL
NULL
KURTOSIS
Computes the sample excess kurtosis over the set of values. Kurtosis measures the difference between the tails of a distribution and the tails of a normal distribution. ReturnsDOUBLE, or DECIMAL if the input is DECIMAL.
Syntax
SQL
| Argument | Data Type | Description |
|---|---|---|
col | Any numeric type | The column for computing the sample kurtosis. |
SQL
4.763725276787391
KURTOSISP
Computes the population kurtosis over the set of values. ReturnsDOUBLE, or DECIMAL if the input is DECIMAL.
Syntax
SQL
| Argument | Data Type | Description |
|---|---|---|
col | Any numeric type | The column for computing population kurtosis. |
SQL
2.268440607993995
MAX
Returns the maximum value in the specified column. The return type matches the input. SyntaxSQL
| Argument | Data Type | Description |
|---|---|---|
col | Any non-interval type | The column for returning the maximum value. |
SQL
5
MEAN_ABSOLUTE_ERROR
Returns the mean absolute error (MAE) between actual and predicted values. This function calculates the average of the absolute differences between each pair of actual and predicted values, treating all errors equally regardless of direction.SQL
| Argument | Data Type | Description |
|---|---|---|
y | NUMERIC | The actual value. |
y_hat | NUMERIC | The predicted value. |
DOUBLE. The result is always zero or positive. A value of 0 indicates perfect predictions.
Examples
Calculate the MAE for Matched Values
This query returns the mean absolute error when all predictions match the actual values.
SQL
0
Calculate the MAE with Comparison Against Squared Values
This query returns the mean absolute error when comparing actual values against their squared values.
SQL
33.0
MEAN_ABSOLUTE_PERCENTAGE_ERROR
Returns the mean absolute percentage error (MAPE) between actual and predicted values. This function calculates the average of the absolute percentage differences using the formulaAverage( AbsoluteValue( (y - y_hat) / y ) ) * 100, which expresses the error relative to the actual values.
SQL
| Argument | Data Type | Description |
|---|---|---|
y | NUMERIC | The actual value. Must not be 0, because this value is the denominator in the percentage calculation. |
y_hat | NUMERIC | The predicted value. |
DOUBLE. The result is a percentage value where 0 indicates perfect predictions.
Examples
Calculate the MAPE for Matched Values
This query returns the mean absolute percentage error when all predictions match the actual values.
SQL
0
Calculate the MAPE with Comparison Against Squared Values
This query returns the mean absolute percentage error when comparing actual values against their squared values.
SQL
450.0
MIN
Returns the minimum value in the specified column. The return type matches the input. SyntaxSQL
| Argument | Data Type | Description |
|---|---|---|
col | Any non-interval type | The column for returning the minimum value. |
SQL
1
PRECISION_SCORE
Returns the precision score for a specified positive class. This function calculates the ratio of true positives to the total number of predicted positives using the formulaTrue Positives / (True Positives + False Positives). Of all the predictions for the specified class, this score measures how many are correct.
SQL
| Argument | Data Type | Description |
|---|---|---|
y | All types supported | The actual class label. |
y_hat | All types supported | The predicted class label. |
positive_class | All types supported | The class label to treat as the positive class. Must match the type of y and y_hat. |
DOUBLE. The result ranges from 0 to 1, where 1 indicates that every prediction for the positive class is correct.
Examples
Calculate the Precision Score for Matched Values
This query returns the precision score when all predictions match the actual values.
SQL
1
Calculate the Precision Score for Mismatched Classifiers
This query returns the precision score for two mismatched binary classifiers, where none of the predicted positives are actual positives.
SQL
0.0
PRODUCT
Computes the product of all values in the column. SyntaxSQL
| Argument | Data Type | Description |
|---|---|---|
col | FLOAT, DOUBLE, DECIMAL, any integral type, or MATRIX (square) | The column to multiply. Integral inputs return BIGINT. |
Matrix arguments must have matching dimensions.
SQL
120
RECALL_SCORE
Returns the recall score for a specified positive class. This function calculates the ratio of true positives to the total number of actual positives using the formulaTrue Positives / (True Positives + False Negatives). Of all the actual instances of the specified class, this score measures how many the model correctly identifies.
SQL
| Argument | Data Type | Description |
|---|---|---|
y | All types supported | The actual class label. |
y_hat | All types supported | The predicted class label. |
positive_class | All types supported | The class label to treat as the positive class. Must match the type of y and y_hat. |
DOUBLE. The result ranges from 0 to 1, where 1 indicates that every actual instance of the positive class is correctly identified.
Examples
Calculate the Recall Score for Matched Values
This query returns the recall score when all predictions match the actual values.
SQL
1
Calculate the Recall Score for Mismatched Classifiers
This query returns the recall score for two mismatched binary classifiers, where none of the actual positives are correctly predicted.
SQL
0.0
ROC_AUC_SCORE
Returns the area under the receiver operating characteristic (ROC) curve (AUC). This function measures the ability of a binary classification model to distinguish between classes. A score of 1.0 represents a perfect classifier, while a score of 0.5 represents performance no better than random guessing.SQL
| Argument | Data Type | Description |
|---|---|---|
y_true | BOOLEAN or INTEGER | The true binary class label, where the values represent two classes (for example, 0 or 1). |
y_score | NUMERIC | The predicted probability or confidence score for the positive class. A score of 1.0 is a perfect classifier, while 0.5 is no better than random guessing. |
positive_class | The class label to treat as the positive class. Must match the type of y_true. |
DOUBLE. The result ranges from 0 to 1.
Example
This query returns the AUC score for a classifier in which higher probability scores correspond to actual positives, indicating perfect class separation.
SQL
1.0
SKEW
Computes the sample skewness over the set of values. Skewness measures the asymmetry of a distribution. ReturnsDOUBLE, or DECIMAL if the input is DECIMAL.
Syntax
SQL
| Argument | Data Type | Description |
|---|---|---|
col | Any numeric type | The column for computing the sample skewness. |
SQL
0.8804164883619399
SKEWP
Computes the population skewness over the set of values. ReturnsDOUBLE, or DECIMAL if the input is DECIMAL.
Syntax
SQL
| Argument | Data Type | Description |
|---|---|---|
col | Any numeric type | The column for computing the population skewness. |
SQL
0.7059036122393627
STDEV
Computes the sample standard deviation over the set of values. ReturnsDOUBLE, or DECIMAL if the input is DECIMAL.
Alias for STDDEV and STDDEV_SAMP.
Syntax
SQL
| Argument | Data Type | Description |
|---|---|---|
col | Any numeric type | The column for computing the sample standard deviation. |
SQL
4.421942042651783
STDDEV
Alias for STDEV.STDDEV_POP
Alias for STDEVP.STDDEV_SAMP
Alias for STDEV.STDEVP
Computes the population standard deviation over the set of values. ReturnsDOUBLE, or DECIMAL if the input is DECIMAL.
Alias for STDDEV_POP.
Syntax
SQL
| Argument | Data Type | Description |
|---|---|---|
col | Any numeric type | The column for computing the population standard deviation. |
SQL
4.136348026943574
SUM
Computes the sum of all values in the column. SyntaxSQL
| Argument | Data Type | Description |
|---|---|---|
col | Any numeric type or MATRIX | The column to sum. Integral inputs return BIGINT. d FLOAT inputs return FLOAT. DOUBLE, DECIMAL, and MATRIX inputs preserve their type. |
SQL
15
VAR_POP
Alias for VARIANCEP.VAR_SAMP
Alias for VARIANCE.VARIANCE
Computes the sample variance over the set of values. ReturnsDOUBLE, or DECIMAL if the input is DECIMAL.
MATRIX inputs preserve their type and must be square.
Alias for VAR_SAMP.
Syntax
SQL
| Argument | Data Type | Description |
|---|---|---|
col | Any numeric type or MATRIX (square) | The column for computing the sample variance. |
Matrix arguments must have matching dimensions.
SQL
19.553571428571427
VARIANCEP
Computes the population variance over the set of values. ReturnsDOUBLE, or DECIMAL if the input is DECIMAL.
MATRIX inputs preserve their type and must be square.
Alias for VAR_POP.
Syntax
SQL
| Argument | Data Type | Description |
|---|---|---|
col | Any numeric type or MATRIX (square) | The column for computing population variance. |
SQL
17.109375
Sorted Aggregate Functions
The general syntax for sorted aggregate functions adds the DISTINCT and ORDER BY keywords in the function invocation.SQL
ARRAY_AGG
Returns an array containing every row from the expression. SyntaxSQL
| Argument | Data Type | Description |
|---|---|---|
expr | Any numeric type | Expression for aggregation into an array. |
SQL
[10,9,8,7,6,5,4,3,2,1]
ARRAY_CONCAT_AGG
Returns an array that concatenates arrays across rows. The input argument is a SQL expression. The element type must be consistent across rows. The function ignores NULL inputs and returns NULL only when all inputs are NULL. TheDISTINCT keyword removes duplicate arrays. To remove duplicate elements from an array, use the ARRAY_DISTINCT function during load.
The ORDER BY syntax controls the order of concatenation across rows. To sort elements in the array, use the ARRAY_SORT function during load.
Syntax
SQL
| Argument | Data Type | Description |
|---|---|---|
expr | SQL expression | Expression for concatenation across rows. |
data_type_example table with a column that stores arrays of integers.
SQL
1, 2, and 3 into the table.
SQL
4, 5, and 6 into the table.
SQL
SQL
[1,2,3,4,5,6]
STRING_AGG
Returns a string concatenated from every row of the expression. SyntaxSQL
| Argument | Data Type | Description |
|---|---|---|
expr | SQL expression | Expression for concatenation across rows. |
delimiter | Optional. |
CHAR casting function.
SQL
"10987654321"
Concatenate Rows as a String with a Delimiter
Concatenate ten rows into a string in ascending order. Cast integers into strings using the CHAR casting function. Use the | delimiter.
SQL
"1|2|3|4|5|6|7|8|9|10"

