Applies to: CELONIS 4.3 CELONIS 4.4 CELONIS 4.5 CELONIS 4.6 CELONIS 4.7
Description
This function calculates the z score over an INT or a FLOAT. The output type is always FLOAT.
ZScore can act as a standardization of data by mapping each value to the distance to the mean in multiples of standard deviations. This is especially useful in evaluating simple 2-, 3- or 6-sigma rules for outlier detection on a column.
Syntax
ZSCORE ( table.column )
NULL handling
If the input column contains NULL values, they are ignored and do not affect the calculation of the mean. The output for a null value is NULL as well.
Tips
- The calculation is parallelized. The function first calculates the variance in two parallelized passes over the data in the column then the ZScore over the unique values of the column is calculated in parallel.
- The ZScore of large FLOAT columns can have a large error.
- ZScore over DATE columns should utilize a DateTime projection function.
Examples
[2] ZSCORE in combination with a date column using HOURS as projection function.
Query
Column1
ZSCORE ( HOURS ( "TABLE"."COLUMN" ) )
Input
TABLE
COLUMN : DATE |
---|
Sat Jan 01 2000 06:00:00.000 |
Sat Jan 01 2000 07:00:00.000 |
Sat Jan 01 2000 08:00:00.000 |
Sat Jan 01 2000 09:00:00.000 |
Sat Jan 01 2000 10:00:00.000 |
Sat Jan 01 2000 11:00:00.000 |
Sat Jan 01 2000 12:00:00.000 |
Sat Jan 01 2000 13:00:00.000 |
Sat Jan 01 2000 14:00:00.000 |
Sat Jan 01 2000 18:00:00.000 |
Sat Jan 01 2000 08:00:00.000 |
Sat Jan 01 2000 09:00:00.000 |
Sat Jan 01 2000 08:00:00.000 |
Output
Result
Column1 : FLOAT |
---|
-1.2741407711983999 |
-0.9729802252787779 |
-0.6718196793591561 |
-0.3706591334395343 |
-0.06949858751991247 |
0.23166195839970935 |
0.5328225043193312 |
0.833983050238953 |
1.1351435961585747 |
2.339785779837062 |
-0.6718196793591561 |
-0.3706591334395343 |
-0.6718196793591561 |