Applies to:  CELONIS 4.3 CELONIS 4.4 CELONIS 4.5 CELONIS 4.6 CELONIS 4.7 

Description

This function calculates the z score over an INT or a FLOAT. The output type is always FLOAT.

ZScore can act as a standardization of data by mapping each value to the distance to the mean in multiples of standard deviations. This is especially useful in evaluating simple 2-, 3- or 6-sigma rules for outlier detection on a column.

Syntax

ZSCORE ( table.column )

NULL handling

If the input column contains NULL values, they are ignored and do not affect the calculation of the mean. The output for a null value is NULL as well.

Tips

  • The calculation is parallelized. The function first calculates the variance in two parallelized passes over the data in the column then the ZScore over the unique values of the column is calculated in parallel.
  • The ZScore of large FLOAT columns can have a large error.
  • ZScore over DATE columns should utilize a DateTime projection function.

Examples


[1] Simple ZSCORE calculation over FLOAT column.

Query
Column1
ZSCORE ( "TABLE"."COLUMN" )
Input
TABLE
COLUMN : FLOAT
1.0
1.0
7.0
7.0
4.0
Output
Result
Column1 : FLOAT
-1.0
-1.0
1.0
1.0
0.0



[2] ZSCORE in combination with a date column using HOURS as projection function.

Query
Column1
ZSCORE ( HOURS ( "TABLE"."COLUMN" ) )
Input
TABLE
COLUMN : DATE
Sat Jan 01 2000 06:00:00.000
Sat Jan 01 2000 07:00:00.000
Sat Jan 01 2000 08:00:00.000
Sat Jan 01 2000 09:00:00.000
Sat Jan 01 2000 10:00:00.000
Sat Jan 01 2000 11:00:00.000
Sat Jan 01 2000 12:00:00.000
Sat Jan 01 2000 13:00:00.000
Sat Jan 01 2000 14:00:00.000
Sat Jan 01 2000 18:00:00.000
Sat Jan 01 2000 08:00:00.000
Sat Jan 01 2000 09:00:00.000
Sat Jan 01 2000 08:00:00.000
Output
Result
Column1 : FLOAT
-1.2741407711983999
-0.9729802252787779
-0.6718196793591561
-0.3706591334395343
-0.06949858751991247
0.23166195839970935
0.5328225043193312
0.833983050238953
1.1351435961585747
2.339785779837062
-0.6718196793591561
-0.3706591334395343
-0.6718196793591561


See also:

  • No labels