Aggregate functions v16
Aggregate functions compute a single result value from a set of input values.
Built-in aggregate functions
The built-in aggregate functions are listed in the following tables.
Function | Argument type | Return type | Description |
---|---|---|---|
AVG (expression) | INTEGER, REAL, DOUBLE PRECISION, NUMBER | NUMBER for any integer type, DOUBLE PRECISION for a floating-point argument, otherwise the same as the argument data type | The average (arithmetic mean) of all input values |
COUNT(*) | BIGINT | Number of input rows | |
COUNT (expression) | Any | BIGINT | Number of input rows for which the value of expression is not null |
MAX (expression) | Any numeric, string, date/time, or bytea type | Same as argument type | Maximum value of expression across all input values |
MIN (expression) | Any numeric, string, date/time, or bytea type | Same as argument type | Minimum value of expression across all input values |
SUM (expression) | INTEGER, REAL, DOUBLE PRECISION, NUMBER | BIGINT for SMALLINT or INTEGER arguments, NUMBER for BIGINT arguments, DOUBLE PRECISION for floating-point arguments, otherwise the same as the argument data type | Sum of expression across all input values |
Except for COUNT
, these functions return a null value when no rows are selected. In particular, SUM
of no rows returns null, not zero. You can use the COALESCE
function to substitute zero for null when necessary.
Aggregate functions for statistical analysis
The following table shows the aggregate functions typically used in statistical analysis. Where the description mentions N
, it means the number of input rows for which all the input expressions are non-null. In all cases, null is returned if the computation is meaningless, for example when N
is zero.
Function | Argument type | Return type | Description |
---|---|---|---|
CORR(Y, X) | DOUBLE PRECISION | DOUBLE PRECISION | Correlation coefficient |
COVAR_POP(Y, X) | DOUBLE PRECISION | DOUBLE PRECISION | Population covariance |
COVAR_SAMP(Y, X) | DOUBLE PRECISION | DOUBLE PRECISION | Sample covariance |
REGR_AVGX(Y, X) | DOUBLE PRECISION | DOUBLE PRECISION | Average of the independent variable (sum(X) / N ) |
REGR_AVGY(Y, X) | DOUBLE PRECISION | DOUBLE PRECISION | Average of the dependent variable (sum(Y) / N ) |
REGR_COUNT(Y, X) | DOUBLE PRECISION | DOUBLE PRECISION | Number of input rows in which both expressions are nonnull |
REGR_INTERCEPT(Y, X) | DOUBLE PRECISION | DOUBLE PRECISION | y-intercept of the least-squares-fit linear equation determined by the (X, Y) pairs |
REGR_R2(Y, X) | DOUBLE PRECISION | DOUBLE PRECISION | Square of the correlation coefficient |
REGR_SLOPE(Y, X) | DOUBLE PRECISION | DOUBLE PRECISION | Slope of the least-squares-fit linear equation determined by the (X, Y) pairs |
REGR_SXX(Y, X) | DOUBLE PRECISION | DOUBLE PRECISION | Sum (X2) – sum (X) 2 / N (“sum of squares” of the independent variable) |
REGR_SXY(Y, X) | DOUBLE PRECISION | DOUBLE PRECISION | Sum (X*Y) – sum (X) * sum (Y) / N (“sum of products” of independent times dependent variable) |
REGR_SYY(Y, X) | DOUBLE PRECISION | DOUBLE PRECISION | Sum (Y2) – sum (Y) 2 / N (“sum of squares” of the dependent variable) |
STDDEV(expression) | INTEGER, REAL, DOUBLE PRECISION, NUMBER | DOUBLE PRECISION for floating-point arguments, otherwise NUMBER | Historic alias for STDDEV_SAMP |
STDDEV_POP(expression) | INTEGER, REAL, DOUBLE PRECISION, NUMBER | DOUBLE PRECISION for floating-point arguments, otherwise NUMBER | Population standard deviation of the input values |
STDDEV_SAMP(expression) | INTEGER, REAL, DOUBLE PRECISION, NUMBER | DOUBLE PRECISION for floating-point arguments, otherwise NUMBER | Sample standard deviation of the input values |
VARIANCE(expression) | INTEGER, REAL, DOUBLE PRECISION, NUMBER | DOUBLE PRECISION for floating-point arguments, otherwise NUMBER | Historical alias for VAR_SAMP |
VAR_POP(expression) | INTEGER, REAL, DOUBLE PRECISION, NUMBER | DOUBLE PRECISION for floating-point arguments, otherwise NUMBER | Population variance of the input values (square of the population standard deviation) |
VAR_SAMP(expression) | INTEGER, REAL, DOUBLE PRECISION, NUMBER | DOUBLE PRECISION for floating-point arguments, otherwise NUMBER | Sample variance of the input values (square of the sample standard deviation) |
LISTAGG
LISTAGG
is an aggregate function that concatenates data from multiple rows into a single row in an ordered manner. You can optionally include a custom delimiter for your data.
The LISTAGG
function mandates the use of an ORDER BY
clause under a WITHIN GROUP
clause to concatenate values of the measure column and then generate the ordered aggregated data.
Objective
- You can use
LISTAGG
without any grouping. In this case, theLISTAGG
function operates on all rows in a table and returns a single row. - You can use
LISTAGG
with theGROUP BY
clause. In this case, theLISTAGG
function operates on each group and returns an aggregated output for each group. - You can use
LISTAGG
with theOVER
clause. In this case, theLISTAGG
function partitions a query result set into groups based on the expression in thequery_partition_by_clause
and then aggregates data in each group.
Synopsis
Parameters
measure_expr
measure_expr
(mandatory) specifies the column or expression that assigns a value to aggregate. NULL
values are ignored.
delimiter
delimiter
(optional) specifies a string that separates the concatenated values in the result row. The delimiter
can be a NULL
value, string, character literal, column name, or constant expression. If ignored, the LISTAGG
function uses a NULL
value by default.
order_by_clause
order_by_clause
(mandatory) determines the sort order in which the concatenated values are returned.
query_partition_by_clause
query_partition_by_clause
(optional) allows the LISTAGG
function to be used as an analytic function and sets the range of records for each group in the OVER
clause.
Return type
The LISTAGG
function returns a string value.
Examples
This example concatenates the values in the emp
table and lists all the employees separated by a delimiter
comma. First, create a table named emp
. Then insert records into the emp
table.
This example uses a PARTITION BY
clause with LISTAGG
in the emp
table. It generates output based on a partition by deptno
that applies to each partition and not on the entire table.
This example includes the GROUP BY
clause.
MEDIAN
The MEDIAN
function calculates the middle value of an expression from a given range of values. NULL
values are ignored. The MEDIAN
function returns an error if a query doesn't reference the user-defined table.
Objective
- You can use
MEDIAN
without any grouping. In this case, theMEDIAN
function operates on all rows in a table and returns a single row. - You can use
MEDIAN
with theOVER
clause. In this case, theMEDIAN
function partitions a query result set into groups based on theexpression
specified in thePARTITION BY
clause. It then aggregates data in each group.
Synopsis
Parameters
median_expression
median_expression
(mandatory) is a target column or expression that the MEDIAN
function operates on and returns a median value. It can be a numeric, datetime, or interval data type.
PARTITION BY
PARTITION BY
clause (optional) allows you to use MEDIAN
as an analytic function and sets the range of records for each group in the OVER
clause.
Return types
The return type is determined by the input data type of expression
. The following table shows the return type for each input type.
Input type | Return type |
---|---|
BIGINT | NUMERIC |
FLOAT, DOUBLE PRECISION | DOUBLE PRECISION |
INTEGER | NUMERIC |
INTERVAL | INTERVAL |
NUMERIC | NUMERIC |
REAL | REAL |
SMALLINT | NUMERIC |
TIMESTAMP | TIMESTAMP |
TIMESTAMPTZ | TIMESTAMPTZ |
Examples
In this example, a query returns the median salary for each department in the emp
table:
This example uses the PARTITION BY
clause with MEDIAN
in the emp
table and returns the median salary based on a partition by deptno
:
You can compare the MEDIAN
function with PERCENTILE_CONT
. In this example, MEDIAN
generates the same result as PERCENTILE_CONT
:
STATS_MODE
The STATS_MODE
function takes a set of values as an argument and returns the value that occurs with the highest frequency. If multiple values appear with the same frequency, the STATS_MODE
function arbitrarily chooses the first value and returns only that one value.
Objective
- You can use
STATS_MODE
without any grouping. In this case, theSTATS_MODE
function operates on all the rows in a table and returns a single value. - You can use
STATS_MODE
as an ordered-set aggregate function using theWITHIN GROUP
clause. In this case, theSTATS_MODE
function operates on the ordered data set. - You can use
STATS_MODE
with theGROUP BY
clause. In this case, theSTATS_MODE
function operates on each group and returns the most frequent and aggregated output for each group.
Synopsis
Or
Parameters
expr
An expression or value to assign to the column.
Return type
The STATS_MODE
function returns a value that appears frequently. However, if all the values of a column are NULL
, STATS_MODE
returns NULL
.
Examples
This example returns the mode of salary in the emp
table: