9.3 Linear Regression Aggregate Functions

Linear regression functions are useful for trend line continuation. The trend or regression line is usually a pattern followed by a set of values. Linear regression is useful to predict future values. To continue the regression line, you need to know the slope and the point of intersection with the y-axis. As set of linear functions can be used for calculating these values.

In the function syntax, y is interpreted as an x-dependent variable.

The linear regression aggregate functions take a pair of arguments, the dependent variable expression (y) and the independent variable expression (x), which are both numeric value expressions. Any row in which either argument evaluates to NULL is removed from the rows that qualify. If there are no rows that qualify, then the result of REGR_COUNT is 0 (zero), and the other linear regression aggregate functions result in NULL.

9.3.1 REGR_AVGX

Available inDSQL, PSQL

Result typeDOUBLE PRECISION

Syntax

REGR_AVGX ( <y>, <x> )

The function REGR_AVGX calculates the average of the independent variable (x) of the regression line.

The function REGR_AVGX(<y>, <x>) is equivalent to

SUM(<exprX>) / REGR_COUNT(<y>, <x>)

<exprX> :==
CASE WHEN <x> IS NOT NULL AND <y> IS NOT NULL THEN <x> END

9.3.2 REGR_AVGY

Available inDSQL, PSQL

Result typeDOUBLE PRECISION

Syntax

REGR_AVGY ( <y>, <x> )

The function REGR_AVGY calculates the average of the dependent variable (y) of the regression line.

The function REGR_AVGY(<y>, <x>) is equivalent to

SUM(<exprY>) / REGR_COUNT(<y>, <x>)

<exprY> :==
CASE WHEN <x> IS NOT NULL AND <y> IS NOT NULL THEN <y> END

9.3.3 REGR_COUNT

Available inDSQL, PSQL

Result typeDOUBLE PRECISION

Syntax

REGR_COUNT ( <y>, <x> )

The function REGR_COUNT counts the number of non-empty pairs of the regression line.

The function REGR_COUNT(<y>, <x>) is equivalent to

SUM(<exprXY>) / REGR_COUNT(<y>, <x>)

<exprXY> :==
CASE WHEN <x> IS NOT NULL AND <y> IS NOT NULL THEN 1 END

9.3.4 REGR_INTERCEPT

Available inDSQL, PSQL

Result typeDOUBLE PRECISION

Syntax

REGR_INTERCEPT ( <y>, <x> )

The function REGR_INTERCEPT calculates the point of intersection of the regression line with the y-axis.

The function REGR_INTERCEPT(<y>, <x>) is equivalent to

REGR_AVGY(<y>, <x>) - REGR_SLOPE(<y>, <x>) * REGR_AVGX(<y>, <x>)

9.3.4.1 REGR_INTERCEPT Examples

Forecasting sales volume

with recursive years (byyear) as (
select 1991
from rdb\$database
union all
select byyear + 1
from years
where byyear < 2020
),
s as (
select
extract(year from order_date) as byyear,
sum(total_value) as total_value
from sales
group by 1
),
regr as (
select
regr_intercept(total_value, byyear) as intercept,
regr_slope(total_value, byyear) as slope
from s
)
select
years.byyear as byyear,
intercept + (slope * years.byyear) as total_value
from years
cross join regr
BYYEAR TOTAL_VALUE
------ ------------
1991    118377.35
1992    414557.62
1993    710737.89
1994   1006918.16
1995   1303098.43
1996   1599278.69
1997   1895458.96
1998   2191639.23
1999   2487819.50
2000   2783999.77
...

9.3.5 REGR_R2

Available inDSQL, PSQL

Result typeDOUBLE PRECISION

Syntax

REGR_R2 ( <y>, <x> )

The REGR_R2 function calculates the coefficient of determination, or R-squared, of the regression line.

The function REGR_R2(<y>, <x>) is equivalent to

POWER(CORR(<y>, <x>), 2)

9.3.6 REGR_SLOPE

Available inDSQL, PSQL

Result typeDOUBLE PRECISION

Syntax

REGR_SLOPE ( <y>, <x> )

The function REGR_SLOPE calculates the slope of the regression line.

The function REGR_SLOPE(<y>, <x>) is equivalent to

COVAR_POP(<y>, <x>) / VAR_POP(<exprX>)

<exprX> :==
CASE WHEN <x> IS NOT NULL AND <y> IS NOT NULL THEN <x> END

9.3.7 REGR_SXX

Available inDSQL, PSQL

Result typeDOUBLE PRECISION

Syntax

REGR_SXX ( <y>, <x> )

The function REGR_SXX calculates the sum of squares of the independent expression variable (x).

The function REGR_SXX(<y>, <x>) is equivalent to

REGR_COUNT(<y>, <x>) * VAR_POP(<exprX>)

<exprX> :==
CASE WHEN <x> IS NOT NULL AND <y> IS NOT NULL THEN <x> END

9.3.8 REGR_SXY

Available inDSQL, PSQL

Result typeDOUBLE PRECISION

Syntax

REGR_SXY ( <y>, <x> )

The function REGR_SXY calculates the sum of products of independent variable expression (x) times dependent variable expression (y).

The function REGR_SXY(<y>, <x>) is equivalent to

REGR_COUNT(<y>, <x>) * COVAR_POP(<y>, <x>)

9.3.9 REGR_SYY

Available inDSQL, PSQL

Result typeDOUBLE PRECISION

Syntax

REGR_SYY ( <y>, <x> )

The function REGR_SYY calculates the sum of squares of the dependent variable (y).

The function REGR_SYY(<y>, <x>) is equivalent to

REGR_COUNT(<y>, <x>) * VAR_POP(<exprY>)

<exprY> :==
CASE WHEN <x> IS NOT NULL AND <y> IS NOT NULL THEN <y> END