Aggregate Functions
the has aggregated and sorted aggregate functions aggregate and sorted aggregate functions compute a single result from a group of rows you can use the distinct keyword with any aggregation function, such as count(distinct col) whereas sorted aggregate functions use the standard order by syntax to dictate the ordering of the aggregation to sort the elements of an array, use the array sort docid\ jhsjeme4obz0xpegzai0h function during data load aggregate functions supported input types vary by function in general, numeric types include tinyint , smallint , int , bigint , float , double , and decimal various examples on this page use these tables create table sample data (val int, label varchar(10)); insert into sample data (val, label) values (2, 'alpha'), (4, 'beta'), (4, 'alpha'), (5, null), (7, 'gamma'), (8, 'alpha'), (12, 'beta'), (15, 'gamma'); create table predictions (actual int, predicted int); insert into predictions (actual, predicted) values (10, 12), (20, 18), (30, 33), (40, 37), (50, 52), (60, 58); for simple functions, examples use the sys dummy virtual table for details, see generate tables using sys dummy docid\ h27revcjzrpzmmvhosb14 accuracy score returns the fraction of predictions that match the actual class labels this function calculates the ratio of correctly predicted rows to the total number of rows accuracy score(y, y hat) argument data type description y all types supported the actual class label y hat all types supported the predicted class label the return type is double the result represents the proportion of correct predictions, ranging from 0 to 1 examples calculate the accuracy score for matched values this query returns the accuracy score when all predictions match the actual values select accuracy score(c1, c1) from sys dummy10; output 1 calculate the accuracy score with comparison against squared values this query returns the accuracy score when comparing actual values against their squared values select accuracy score(c1, c1 c1) from sys dummy10; output 0 1 any value returns an arbitrary non null value from the input column the function returns null only if all rows in the column are null the any value function does not support window aggregation syntax any value(col) argument data type description col any type the column from which to return an arbitrary non null value example select any value(label) as sample label from sample data; output alpha the returned value is non deterministic the database can return any non null value from the column approx count distinct returns an approximate count of distinct values in the column using the hyperloglog algorithm, with a 95% confidence interval that the result is within 4 5% of the exact count syntax approx count distinct(col) argument data type description col any non interval type the column for the approximation of the distinct count example select approx count distinct(val) as approx distinct from sample data; output 7 the sample data table has 8 rows, but there are only 7 distinct values in the val column (the value 4 appears twice) approx sum computes a sum using a faster, non deterministic ordering for floating point columns this summation can lead to minor differences in the result on the order of the machine epsilon for integral column types, the function uses the standard sum algorithm syntax approx sum(col) argument data type description col float , double , or any integral type the column to sum for integral inputs, the approx sum function uses the standard sum algorithm and returns the same result the performance benefit applies to large floating point data sets example select approx sum(c1) as approx total from sys dummy5; output 15 avg computes the arithmetic mean over the set of values syntax avg(col) argument data type description col any numeric type or matrix the column to average integral and float inputs return double double , decimal , and matrix inputs preserve their type example select avg(c1) as avg val from sys dummy5; output 3 0 coefficient of determination computes the coefficient of determination (r²) between actual and predicted values primarily used to evaluate the performance of machine learning regression models, r² measures the proportion of variance in the actual values that the predicted values explain a value of 1 0 indicates a perfect fit, while 0 0 indicates that the predictions explain none of the variance this function returns null when the total sum of squares is zero (i e , all actual values are identical) syntax coefficient of determination(actual, predicted) argument data type description actual any numeric type the column of observed values predicted any numeric type the column of predicted values example select coefficient of determination(actual, predicted) as r squared from predictions; output 0 980571428571429 confusion matrix returns a structured representation of the counts for every combination of actual and predicted class labels relative to a specified positive class this function aggregates predictions into a 2x2 matrix in the format \[\[tp, fp], \[fn, tn]] for binary classification analysis confusion matrix(y, y hat, positive class) argument data type description y all types supported the actual class label y hat all types supported the predicted class label positive class all types supported the class label to treat as the positive class must match the type of y and y hat the return type is tuple each entry in the result contains the count of rows for a specific combination of actual and predicted class values, organized as true positives, false positives, false negatives, and true negatives examples calculate the confusion matrix with matched values this query returns the confusion matrix when all predictions match the actual values select confusion matrix( case when c1 > 5 then 1 else 0 end, case when c1 > 5 then 1 else 0 end, 1 ) from sys dummy10; output \[\[5, 0], \[0, 5]] calculate the confusion matrix with mismatched classifiers this query returns the confusion matrix for two mismatched binary classifiers, where actual positives are rows with c1 > 5 and predicted positives are rows with c1 < 3 select confusion matrix( case when c1 > 5 then 1 else 0 end, case when c1 < 3 then 1 else 0 end, 1 ) from sys dummy10; output \[\[0, 2], \[5, 3]] corr alias for correlationp docid\ zrac69qenzyasqgj5woq8 correlation alias for correlationp docid\ zrac69qenzyasqgj5woq8 correlationp computes the population pearson correlation coefficient between two columns returns double , or decimal if both inputs are decimal compatible alias for correlation and corr syntax correlationp(col1, col2) argument data type description col1 any numeric type the first column col2 any numeric type the second column example select correlationp(actual, predicted) as corr val from predictions; output 0 9904654955172436 count returns the number of rows in the set where values in col are not null when you use the count( ) sql statement, the function returns the total number of rows the return type is bigint syntax count(col) argument data type description col any non interval type the column for counting non null values use to count all rows instead examples count all rows select count( ) as total rows from sample data; output 8 count non null values the label column has one null row select count(label) as rows with label from sample data; output 7 covar pop alias for covariancep docid\ zrac69qenzyasqgj5woq8 covar samp alias for covariance covariance computes the sample covariance between two columns returns double by default, or decimal if both inputs are decimal compatible matrix inputs preserve their type alias for covar samp syntax covariance(col1, col2) argument data type description col1 any numeric type or matrix (square) the first column col2 any numeric type or matrix (square) the second column matrix arguments must have matching dimensions example select covariance(actual, predicted) as cov samp from predictions; output 336 0 covariancep computes the population covariance between two columns returns double by default, or decimal if both inputs are compatible with decimal matrix inputs preserve their type alias for covar pop syntax covariancep(col1, col2) argument data type description col1 any numeric type or matrix (square) the first column col2 any numeric type or matrix (square) the second column matrix arguments must have matching dimensions example select covariancep(actual, predicted) as cov pop from predictions; output 280 0 f1 score returns the harmonic mean of precision and recall for a specified positive class this function calculates 2 (precision recall) / (precision + recall) and provides a single score that balances both precision and recall the function returns null when both precision and recall are 0 f1 score(y, y hat, positive class) argument data type description y all types supported the actual class label y hat all types supported the predicted class label positive class all types supported the class label to treat as the positive class must match the type of y and y hat the return type is double the result ranges from 0 to 1, where 1 represents perfect precision and recall examples calculate the f1 score with matching values this query returns the f1 score when all predictions match the actual values select f1 score( case when c1 > 5 then 1 else 0 end, case when c1 > 5 then 1 else 0 end, 1 ) from sys dummy10; output 1 calculate the f1 score for mismatched classifiers this query returns null because both precision and recall are 0 for the mismatched classifiers select f1 score( case when c1 > 5 then 1 else 0 end, case when c1 < 3 then 1 else 0 end, 1 ) from sys dummy10; output null kurtosis computes the sample excess kurtosis over the set of values kurtosis measures the difference between the tails of a distribution and the tails of a normal distribution returns double , or decimal if the input is decimal syntax kurtosis(col) argument data type description col any numeric type the column for computing the sample kurtosis example select kurtosis(val) as kurt val from sample data; output 4 763725276787391 kurtosisp computes the population kurtosis over the set of values returns double , or decimal if the input is decimal syntax kurtosisp(col) argument data type description col any numeric type the column for computing population kurtosis example select kurtosisp(val) as kurtp val from sample data; output 2 268440607993995 max returns the maximum value in the specified column the return type matches the input syntax max(col) argument data type description col any non interval type the column for returning the maximum value example select max(c1) as max val from sys dummy5; output 5 mean absolute error returns the mean absolute error (mae) between actual and predicted values this function calculates the average of the absolute differences between each pair of actual and predicted values, treating all errors equally regardless of direction mean absolute error(y, y hat) argument data type description y numeric the actual value y hat numeric the predicted value the return type is double the result is always zero or positive a value of 0 indicates perfect predictions examples calculate the mae for matched values this query returns the mean absolute error when all predictions match the actual values select mean absolute error(c1, c1) from sys dummy10; output 0 calculate the mae with comparison against squared values this query returns the mean absolute error when comparing actual values against their squared values select mean absolute error(c1, c1 c1) from sys dummy10; output 33 0 mean absolute percentage error returns the mean absolute percentage error (mape) between actual and predicted values this function calculates the average of the absolute percentage differences using the formula average( absolutevalue( (y y hat) / y ) ) 100 , which expresses the error relative to the actual values mean absolute percentage error(y, y hat) argument data type description y numeric the actual value must not be 0, because this value is the denominator in the percentage calculation y hat numeric the predicted value the return type is double the result is a percentage value where 0 indicates perfect predictions examples calculate the mape for matched values this query returns the mean absolute percentage error when all predictions match the actual values select mean absolute percentage error(c1, c1) from sys dummy10; output 0 calculate the mape with comparison against squared values this query returns the mean absolute percentage error when comparing actual values against their squared values select mean absolute percentage error(c1, c1 c1) from sys dummy10; output 450 0 min returns the minimum value in the specified column the return type matches the input syntax min(col) argument data type description col any non interval type the column for returning the minimum value example select min(c1) as min val from sys dummy5; output 1 precision score returns the precision score for a specified positive class this function calculates the ratio of true positives to the total number of predicted positives using the formula true positives / (true positives + false positives) of all the predictions for the specified class, this score measures how many are correct precision score(y, y hat, positive class) argument data type description y all types supported the actual class label y hat all types supported the predicted class label positive class all types supported the class label to treat as the positive class must match the type of y and y hat the return type is double the result ranges from 0 to 1, where 1 indicates that every prediction for the positive class is correct examples calculate the precision score for matched values this query returns the precision score when all predictions match the actual values select precision score( case when c1 > 5 then 1 else 0 end, case when c1 > 5 then 1 else 0 end, 1 ) from sys dummy10; output 1 calculate the precision score for mismatched classifiers this query returns the precision score for two mismatched binary classifiers, where none of the predicted positives are actual positives select precision score( case when c1 > 5 then 1 else 0 end, case when c1 < 3 then 1 else 0 end, 1 ) from sys dummy10; output 0 0 product computes the product of all values in the column syntax product(col) argument data type description col float , double , decimal , any integral type, or matrix (square) the column to multiply integral inputs return bigint matrix arguments must have matching dimensions example select product(c1) as factorial 5 from sys dummy5; output 120 recall score returns the recall score for a specified positive class this function calculates the ratio of true positives to the total number of actual positives using the formula true positives / (true positives + false negatives) of all the actual instances of the specified class, this score measures how many the model correctly identifies recall score(y, y hat, positive class) argument data type description y all types supported the actual class label y hat all types supported the predicted class label positive class all types supported the class label to treat as the positive class must match the type of y and y hat the return type is double the result ranges from 0 to 1, where 1 indicates that every actual instance of the positive class is correctly identified examples calculate the recall score for matched values this query returns the recall score when all predictions match the actual values select recall score( case when c1 > 5 then 1 else 0 end, case when c1 > 5 then 1 else 0 end, 1 ) from sys dummy10; output 1 calculate the recall score for mismatched classifiers this query returns the recall score for two mismatched binary classifiers, where none of the actual positives are correctly predicted select recall score( case when c1 > 5 then 1 else 0 end, case when c1 < 3 then 1 else 0 end, 1 ) from sys dummy10; output 0 0 roc auc score returns the area under the receiver operating characteristic (roc) curve (auc) this function measures the ability of a binary classification model to distinguish between classes a score of 1 0 represents a perfect classifier, while a score of 0 5 represents performance no better than random guessing roc auc score(y true, y score, positive class) argument data type description y true boolean or integer the true binary class label, where the values represent two classes (for example, 0 or 1) y score numeric the predicted probability or confidence score for the positive class a score of 1 0 is a perfect classifier, while 0 5 is no better than random guessing positive class the class label to treat as the positive class must match the type of y true the return type is double the result ranges from 0 to 1 example this query returns the auc score for a classifier in which higher probability scores correspond to actual positives, indicating perfect class separation select roc auc score( case when c1 > 5 then 1 else 0 end, double(c1) / 10, 1 ) from sys dummy10; output 1 0 skew computes the sample skewness over the set of values skewness measures the asymmetry of a distribution returns double , or decimal if the input is decimal syntax skew(col) argument data type description col any numeric type the column for computing the sample skewness example select skew(val) as skew val from sample data; output 0 8804164883619399 skewp computes the population skewness over the set of values returns double , or decimal if the input is decimal syntax skewp(col) argument data type description col any numeric type the column for computing the population skewness example select skewp(val) as skewp val from sample data; output 0 7059036122393627 stdev computes the sample standard deviation over the set of values returns double , or decimal if the input is decimal alias for stddev and stddev samp syntax stdev(col) argument data type description col any numeric type the column for computing the sample standard deviation example select stdev(val) as stdev val from sample data; output 4 421942042651783 stddev alias for stdev docid\ zrac69qenzyasqgj5woq8 stddev pop alias for stdevp docid\ zrac69qenzyasqgj5woq8 stddev samp alias for stdev docid\ zrac69qenzyasqgj5woq8 stdevp computes the population standard deviation over the set of values returns double , or decimal if the input is decimal alias for stddev pop syntax stdevp(col) argument data type description col any numeric type the column for computing the population standard deviation example select stdevp(val) as stdevp val from sample data; output 4 136348026943574 sum computes the sum of all values in the column syntax sum(col) argument data type description col any numeric type or matrix the column to sum integral inputs return bigint d float inputs return float double , decimal , and matrix inputs preserve their type example select sum(c1) as total from sys dummy5; output 15 var pop alias for variancep docid\ zrac69qenzyasqgj5woq8 var samp alias for variance docid\ zrac69qenzyasqgj5woq8 variance computes the sample variance over the set of values returns double , or decimal if the input is decimal matrix inputs preserve their type and must be square alias for var samp syntax variance(col) argument data type description col any numeric type or matrix (square) the column for computing the sample variance matrix arguments must have matching dimensions example select variance(val) as var val from sample data; output 19 553571428571427 variancep computes the population variance over the set of values returns double , or decimal if the input is decimal matrix inputs preserve their type and must be square alias for var pop syntax variancep(col) argument data type description col any numeric type or matrix (square) the column for computing population variance example select variancep(val) as varp val from sample data; output 17 109375 sorted aggregate functions the general syntax for sorted aggregate functions adds the distinct and order by keywords in the function invocation aggregate(\[distinct] arg1, arg2, \[order by ]) array agg returns an array containing every row from the expression syntax array agg(expr) argument data type description expr any numeric type expression for aggregation into an array example aggregate ten rows into an array in descending order select array agg(c1 order by c1 desc) from sys dummy10; output \[10,9,8,7,6,5,4,3,2,1] array concat agg returns an array that concatenates arrays across rows the input argument is a sql expression the element type must be consistent across rows the function ignores null inputs and returns null only when all inputs are null the distinct keyword removes duplicate arrays to remove duplicate elements from an array, use the array distinct function during load the order by syntax controls the order of concatenation across rows to sort elements in the array, use the array sort function during load syntax array concat agg(expr) argument data type description expr sql expression expression for concatenation across rows example create the data type example table with a column that stores arrays of integers create table data type example (col int array int\[] not null default 'int\[0,1,2,3]'); insert one array with values 1 , 2 , and 3 into the table insert into data type example select array\[1,2,3]; insert the second array with values 4 , 5 , and 6 into the table insert into data type example select array\[4,5,6]; concatenate the two arrays into one array in ascending order select array concat agg(col int array order by col int array asc) from data type example; output \[1,2,3,4,5,6] string agg returns a string concatenated from every row of the expression syntax array concat agg(expr, delimiter) argument data type description expr sql expression expression for concatenation across rows delimiter optional examples concatenate rows as a string concatenate ten rows into a string in descending order cast integers into strings using the char casting function select string agg(char(c1) order by c1 desc) from sys dummy10; output "10987654321" concatenate rows as a string with a delimiter concatenate ten rows into a string in ascending order cast integers into strings using the char casting function use the | delimiter select string agg(char(c1), '|' order by c1 asc) from sys dummy10; output "1|2|3|4|5|6|7|8|9|10" related links math functions and operators docid\ fpjn4e63 o7tdrno0sg v query ocient docid\ rlskzzscthcwe12dnsehy