## 9.4. Linear Regression Aggregate Functions

Linear regression functions are useful for trend line continuation. The trend or regression line is usually a pattern followed by a set of values. Linear regression is useful to predict future values. To continue the regression line, you need to know the slope and the point of intersection with the y-axis. As set of linear functions can be used for calculating these values.

In the function syntax, y is interpreted as an x-dependent variable.

The linear regression aggregate functions take a pair of arguments, the dependent variable expression (y) and the independent variable expression (x), which are both numeric value expressions. Any row in which either argument evaluates to `NULL` is removed from the rows that qualify. If there are no rows that qualify, then the result of `REGR_COUNT` is `0` (zero), and the other linear regression aggregate functions result in `NULL`.

### 9.4.1. `REGR_AVGX()`

Available inDSQL, PSQL

Result type`DOUBLE PRECISION`

Syntax

```  |`REGR_AVGX ( <y>, <x> )`
```

Table 9.14`REGR_AVGX` Function Parameters
ParameterDescription

y

Dependent variable of the regression line. It may contain a table column, a constant, a variable, an expression, a non-aggregate function or a UDF. Aggregate functions are not allowed as expressions.

x

Independent variable of the regression line. It may contain a table column, a constant, a variable, an expression, a non-aggregate function or a UDF. Aggregate functions are not allowed as expressions.

The function `REGR_AVGX` calculates the average of the independent variable (x) of the regression line.

The function `REGR_AVGX(<y>, <x>)` is equivalent to

```  |`SUM(<exprX>) / REGR_COUNT(<y>, <x>)`
|` `
|`<exprX> :==`
|`  CASE WHEN <x> IS NOT NULL AND <y> IS NOT NULL THEN <x> END`
```

### 9.4.2. `REGR_AVGY()`

Available inDSQL, PSQL

Result type`DOUBLE PRECISION`

Syntax

```  |`REGR_AVGY ( <y>, <x> )`
```

Table 9.15`REGR_AVGY` Function Parameters
ParameterDescription

y

Dependent variable of the regression line. It may contain a table column, a constant, a variable, an expression, a non-aggregate function or a UDF. Aggregate functions are not allowed as expressions.

x

Independent variable of the regression line. It may contain a table column, a constant, a variable, an expression, a non-aggregate function or a UDF. Aggregate functions are not allowed as expressions.

The function `REGR_AVGY` calculates the average of the dependent variable (y) of the regression line.

The function `REGR_AVGY(<y>, <x>)` is equivalent to

```  |`SUM(<exprY>) / REGR_COUNT(<y>, <x>)`
|` `
|`<exprY> :==`
|`  CASE WHEN <x> IS NOT NULL AND <y> IS NOT NULL THEN <y> END`
```

### 9.4.3. `REGR_COUNT()`

Available inDSQL, PSQL

Result type`DOUBLE PRECISION`

Syntax

```  |`REGR_COUNT ( <y>, <x> )`
```

Table 9.16`REGR_COUNT` Function Parameters
ParameterDescription

y

Dependent variable of the regression line. It may contain a table column, a constant, a variable, an expression, a non-aggregate function or a UDF. Aggregate functions are not allowed as expressions.

x

Independent variable of the regression line. It may contain a table column, a constant, a variable, an expression, a non-aggregate function or a UDF. Aggregate functions are not allowed as expressions.

The function `REGR_COUNT` counts the number of non-empty pairs of the regression line.

The function `REGR_COUNT(<y>, <x>)` is equivalent to

```  |`COUNT(*) FILTER (WHERE <x> IS NOT NULL AND <y> IS NOT NULL)`
```

### 9.4.4. `REGR_INTERCEPT()`

Available inDSQL, PSQL

Result type`DOUBLE PRECISION`

Syntax

```  |`REGR_INTERCEPT ( <y>, <x> )`
```

Table 9.17`REGR_INTERCEPT` Function Parameters
ParameterDescription

y

Dependent variable of the regression line. It may contain a table column, a constant, a variable, an expression, a non-aggregate function or a UDF. Aggregate functions are not allowed as expressions.

x

Independent variable of the regression line. It may contain a table column, a constant, a variable, an expression, a non-aggregate function or a UDF. Aggregate functions are not allowed as expressions.

The function `REGR_INTERCEPT` calculates the point of intersection of the regression line with the y-axis.

The function `REGR_INTERCEPT(<y>, <x>)` is equivalent to

```  |`REGR_AVGY(<y>, <x>) - REGR_SLOPE(<y>, <x>) * REGR_AVGX(<y>, <x>)`
```

#### 9.4.4.1. `REGR_INTERCEPT` Examples

Forecasting sales volume

```   |`with recursive years (byyear) as (`
|`  select 1991`
|`  from rdb\$database`
|`  union all`
|`  select byyear + 1`
|`  from years`
|`  where byyear < 2020`
|`),`
|`s as (`
|`  select`
|`    extract(year from order_date) as byyear,`
|`    sum(total_value) as total_value`
|`  from sales`
|`  group by 1`
|`),`
|`regr as (`
|`  select`
|`    regr_intercept(total_value, byyear) as intercept,`
|`    regr_slope(total_value, byyear) as slope`
|`  from s`
|`)`
|`select`
|`  years.byyear as byyear,`
|`  intercept + (slope * years.byyear) as total_value`
|`from years`
|`cross join regr`
```
```   |`BYYEAR TOTAL_VALUE`
|`------ ------------`
|`  1991    118377.35`
|`  1992    414557.62`
|`  1993    710737.89`
|`  1994   1006918.16`
|`  1995   1303098.43`
|`  1996   1599278.69`
|`  1997   1895458.96`
|`  1998   2191639.23`
|`  1999   2487819.50`
|`  2000   2783999.77`
|`...`
```

### 9.4.5. `REGR_R2()`

Available inDSQL, PSQL

Result type`DOUBLE PRECISION`

Syntax

```  |`REGR_R2 ( <y>, <x> )`
```

Table 9.18`REGR_R2` Function Parameters
ParameterDescription

y

Dependent variable of the regression line. It may contain a table column, a constant, a variable, an expression, a non-aggregate function or a UDF. Aggregate functions are not allowed as expressions.

x

Independent variable of the regression line. It may contain a table column, a constant, a variable, an expression, a non-aggregate function or a UDF. Aggregate functions are not allowed as expressions.

The REGR_R2 function calculates the coefficient of determination, or R-squared, of the regression line.

The function `REGR_R2(<y>, <x>)` is equivalent to

```  |`POWER(CORR(<y>, <x>), 2)`
```

### 9.4.6. `REGR_SLOPE()`

Available inDSQL, PSQL

Result type`DOUBLE PRECISION`

Syntax

```  |`REGR_SLOPE ( <y>, <x> )`
```

Table 9.19`REGR_SLOPE` Function Parameters
ParameterDescription

y

Dependent variable of the regression line. It may contain a table column, a constant, a variable, an expression, a non-aggregate function or a UDF. Aggregate functions are not allowed as expressions.

x

Independent variable of the regression line. It may contain a table column, a constant, a variable, an expression, a non-aggregate function or a UDF. Aggregate functions are not allowed as expressions.

The function `REGR_SLOPE` calculates the slope of the regression line.

The function `REGR_SLOPE(<y>, <x>)` is equivalent to

```  |`COVAR_POP(<y>, <x>) / VAR_POP(<exprX>)`
|` `
|`<exprX> :==`
|`  CASE WHEN <x> IS NOT NULL AND <y> IS NOT NULL THEN <x> END`
```

### 9.4.7. `REGR_SXX()`

Available inDSQL, PSQL

Result type`DOUBLE PRECISION`

Syntax

```  |`REGR_SXX ( <y>, <x> )`
```

Table 9.20`REGR_SXX` Function Parameters
ParameterDescription

y

Dependent variable of the regression line. It may contain a table column, a constant, a variable, an expression, a non-aggregate function or a UDF. Aggregate functions are not allowed as expressions.

x

Independent variable of the regression line. It may contain a table column, a constant, a variable, an expression, a non-aggregate function or a UDF. Aggregate functions are not allowed as expressions.

The function `REGR_SXX` calculates the sum of squares of the independent expression variable (x).

The function `REGR_SXX(<y>, <x>)` is equivalent to

```  |`REGR_COUNT(<y>, <x>) * VAR_POP(<exprX>)`
|` `
|`<exprX> :==`
|`  CASE WHEN <x> IS NOT NULL AND <y> IS NOT NULL THEN <x> END`
```

### 9.4.8. `REGR_SXY()`

Available inDSQL, PSQL

Result type`DOUBLE PRECISION`

Syntax

```  |`REGR_SXY ( <y>, <x> )`
```

Table 9.21`REGR_SXY` Function Parameters
ParameterDescription

y

Dependent variable of the regression line. It may contain a table column, a constant, a variable, an expression, a non-aggregate function or a UDF. Aggregate functions are not allowed as expressions.

x

Independent variable of the regression line. It may contain a table column, a constant, a variable, an expression, a non-aggregate function or a UDF. Aggregate functions are not allowed as expressions.

The function `REGR_SXY` calculates the sum of products of independent variable expression (x) times dependent variable expression (y).

The function `REGR_SXY(<y>, <x>)` is equivalent to

```  |`REGR_COUNT(<y>, <x>) * COVAR_POP(<y>, <x>)`
```

### 9.4.9. `REGR_SYY()`

Available inDSQL, PSQL

Result type`DOUBLE PRECISION`

Syntax

```  |`REGR_SYY ( <y>, <x> )`
```

Table 9.22`REGR_SYY` Function Parameters
ParameterDescription

y

Dependent variable of the regression line. It may contain a table column, a constant, a variable, an expression, a non-aggregate function or a UDF. Aggregate functions are not allowed as expressions.

x

Independent variable of the regression line. It may contain a table column, a constant, a variable, an expression, a non-aggregate function or a UDF. Aggregate functions are not allowed as expressions.

The function `REGR_SYY` calculates the sum of squares of the dependent variable (y).

The function `REGR_SYY(<y>, <x>)` is equivalent to

```  |`REGR_COUNT(<y>, <x>) * VAR_POP(<exprY>)`
|` `
|`<exprY> :==`
|`  CASE WHEN <x> IS NOT NULL AND <y> IS NOT NULL THEN <y> END`
```