Applies to:  CELONIS 4.0 CELONIS 4.2 CELONIS 4.3 CELONIS 4.4 CELONIS 4.5 

Description

MATCH_PROCESS matches the variants of a process against a given pattern.

Similar functionality is provided by MATCH_PROCESS_REGEX. MATCH_PROCESS uses Nodes and Edges to match the cases. Nodes consist either of a single activity or a list of activities. Edges describe how the nodes are linked together.

Syntax

MATCH_PROCESS ( [ activity_table.activity_column ,] node (, node)* CONNECTED BY edge (, edge)* )
  • activity_column: string column joinable to the activity table
  • node: NODE | OPTIONAL | LOOP | OPTIONAL_LOOP | STARTING | ENDING | single_activity (, single_activity )* AS node_name
    • single_activity: [LIKE] activity (Activity name. LIKE allows you to use wildcards in your activity name)
  • edge: DIRECT | EVENTUALLY [ edge_start_node, edge_end_node ]
    • edge_start_node: node_name
    • edge_end_node: node_name

Node

A node consists of one or more activities. If multiple activities are given, it means one of those activities.

Node Types

  • NODE: Node which has to be part once in the case, without any restrictions on where the node has to be.
  • STARTING: Node which has to happen at the beginning of a case.
  • ENDING: Node which has to happen at the end of a case.
  • LOOP: Node which occurs at least once but can also be repeated.

Edge Types

  • DIRECT: edge_end_node has to follow directly after the edge_start_node
  • EVENTUALLY: between edge_start_node and edge_end_node other activities can be placed

Result

MATCH_PROCESS returns an integer column, which flags all matching cases with 1 and all non matching columns with a 0. The result is often used in combination with a filter.

Examples


Here MATCH_PROCESS flags all cases in which one activity A is followed directly by activity B with a 1.

Query
Column1
"Activities_CASES"."CASE_ID"
Column2
MATCH_PROCESS ( "Activities"."ACTIVITY", NODE['A'] as src, NODE['B'] as trg CONNECTED BY DIRECT [ src, trg ] )
Input
Activities
CASE_ID : STRINGACTIVITY : STRINGTIMESTAMP : DATE
'1''A'
Sun Jan 01 2017 13:00:00.000
'1''B'
Sun Jan 01 2017 13:01:00.000
'1''C'
Sun Jan 01 2017 13:02:00.000
'2''A'
Sun Jan 01 2017 13:00:00.000
'2''C'
Sun Jan 01 2017 13:02:00.000
'2''B'
Sun Jan 01 2017 13:03:00.000
Activities_CASES
CASE_ID : STRING
'1'
'2'

Foreign Keys
Activities_CASES.CASE_IDActivities.CASE_ID
Output
Result
Column1 : STRINGColumn2 : INT
'1'
1
'2'
0



Here is MATCH_PROCESS combined with a filter. The result are only cases in which one activity A is followed by activity B.

Query
Filter
FILTER MATCH_PROCESS ( "Activities"."ACTIVITY", NODE['A'] as src, NODE['B'] as trg CONNECTED BY DIRECT [ src, trg ] ) = 1
Column1
"Activities"."CASE_ID"
Column2
"Activities"."ACTIVITY"
Input
Activities
CASE_ID : STRINGACTIVITY : STRINGTIMESTAMP : DATE
'1''A'
Sun Jan 01 2017 13:00:00.000
'1''B'
Sun Jan 01 2017 13:01:00.000
'1''C'
Sun Jan 01 2017 13:02:00.000
'2''A'
Sun Jan 01 2017 13:00:00.000
'2''C'
Sun Jan 01 2017 13:02:00.000
'2''B'
Sun Jan 01 2017 13:03:00.000
Activities_CASES
CASE_ID : STRING
'1'
'2'

Foreign Keys
Activities_CASES.CASE_IDActivities.CASE_ID
Output
Result
Column1 : STRINGColumn2 : STRING
'1''A'
'1''B'
'1''C'



If an activity has not only to be directly followed by another activity but can come any time later the keyword EVENTUALLY can be used. In this example MATCH_PROCESS flags all cases in which one activity A is followed eventually by activity B with a 1.

Query
Column1
"Activities_CASES"."CASE_ID"
Column2
MATCH_PROCESS ( "Activities"."ACTIVITY", NODE['A'] as src, NODE['B'] as trg CONNECTED BY EVENTUALLY [ src, trg ] )
Input
Activities
CASE_ID : STRINGACTIVITY : STRINGTIMESTAMP : DATE
'1''A'
Sun Jan 01 2017 13:00:00.000
'1''B'
Sun Jan 01 2017 13:01:00.000
'1''C'
Sun Jan 01 2017 13:02:00.000
'2''A'
Sun Jan 01 2017 13:00:00.000
'2''C'
Sun Jan 01 2017 13:02:00.000
'2''B'
Sun Jan 01 2017 13:03:00.000
Activities_CASES
CASE_ID : STRING
'1'
'2'

Foreign Keys
Activities_CASES.CASE_IDActivities.CASE_ID
Output
Result
Column1 : STRINGColumn2 : INT
'1'
1
'2'
1



Here is an example in which node 'node_ab' has two activities. This means that a matching case needs an activity C which comes either after A or B.

Query
Column1
"Activities_CASES"."CASE_ID"
Column2
MATCH_PROCESS ( "Activities"."ACTIVITY", NODE['A', 'B'] as node_ab, NODE['C'] as node_c CONNECTED BY DIRECT [ node_ab, node_c ] )
Input
Activities
CASE_ID : STRINGACTIVITY : STRINGTIMESTAMP : DATE
'1''A'
Sun Jan 01 2017 13:00:00.000
'1''B'
Sun Jan 01 2017 13:01:00.000
'1''C'
Sun Jan 01 2017 13:02:00.000
'2''A'
Sun Jan 01 2017 13:00:00.000
'2''C'
Sun Jan 01 2017 13:02:00.000
'2''B'
Sun Jan 01 2017 13:03:00.000
Activities_CASES
CASE_ID : STRING
'1'
'2'

Foreign Keys
Activities_CASES.CASE_IDActivities.CASE_ID
Output
Result
Column1 : STRINGColumn2 : INT
'1'
1
'2'
1



Nodes can represent loops. Here matching cases can have between activity A and C at least one or more activities of type B.

Query
Column1
"Activities_CASES"."CASE_ID"
Column2
MATCH_PROCESS ( "Activities"."ACTIVITY", NODE['A'] AS node_a, LOOP['B'] AS loop_b, NODE['C'] AS node_c CONNECTED BY DIRECT [ node_a, loop_b ], DIRECT [ loop_b, node_c ] )
Input
Activities
CASE_ID : STRINGACTIVITY : STRINGTIMESTAMP : DATE
'1''A'
Sun Jan 01 2017 13:00:00.000
'1''B'
Sun Jan 01 2017 13:01:00.000
'1''B'
Sun Jan 01 2017 13:02:00.000
'1''C'
Sun Jan 01 2017 13:04:00.000
'2''A'
Sun Jan 01 2017 13:00:00.000
'2''B'
Sun Jan 01 2017 13:01:00.000
'2''C'
Sun Jan 01 2017 13:02:00.000
'3''A'
Sun Jan 01 2017 13:00:00.000
'3''C'
Sun Jan 01 2017 13:02:00.000
Activities_CASES
CASE_ID : STRING
'1'
'2'
'3'

Foreign Keys
Activities_CASES.CASE_IDActivities.CASE_ID
Output
Result
Column1 : STRINGColumn2 : INT
'1'
1
'2'
1
'3'
0



A loop node can also consist of multiple activities. PROCESS_MATCH accepts than all given activities, without regarding order or number of occurrences till another activity is found.

Query
Column1
"Activities_CASES"."CASE_ID"
Column2
MATCH_PROCESS ( "Activities"."ACTIVITY", NODE['A'] AS node_a, LOOP['B', 'C'] AS loop_bc, NODE['D'] AS node_d CONNECTED BY DIRECT [ node_a, loop_bc ], DIRECT [ loop_bc, node_d ] )
Input
Activities
CASE_ID : STRINGACTIVITY : STRINGTIMESTAMP : DATE
'1''A'
Sun Jan 01 2017 13:00:00.000
'1''C'
Sun Jan 01 2017 13:01:00.000
'1''B'
Sun Jan 01 2017 13:02:00.000
'1''B'
Sun Jan 01 2017 13:03:00.000
'1''D'
Sun Jan 01 2017 13:05:00.000
Activities_CASES
CASE_ID : STRING
'1'

Foreign Keys
Activities_CASES.CASE_IDActivities.CASE_ID
Output
Result
Column1 : STRINGColumn2 : INT
'1'
1



Nodes can be forced to be at the start or the end of a case.

Query
Column1
"Activities_CASES"."CASE_ID"
Column2
MATCH_PROCESS ( "Activities"."ACTIVITY", STARTING['A'] AS node_a, ENDING['B'] AS node_b CONNECTED BY DIRECT [ node_a, node_b ] )
Input
Activities
CASE_ID : STRINGACTIVITY : STRINGTIMESTAMP : DATE
'1''A'
Sun Jan 01 2017 13:00:00.000
'1''B'
Sun Jan 01 2017 13:01:00.000
'2''A'
Sun Jan 01 2017 13:02:00.000
'2''B'
Sun Jan 01 2017 13:03:00.000
'2''C'
Sun Jan 01 2017 13:05:00.000
'3''A'
Sun Jan 01 2017 13:06:00.000
'3''A'
Sun Jan 01 2017 13:07:00.000
'3''B'
Sun Jan 01 2017 13:08:00.000
Activities_CASES
CASE_ID : STRING
'1'
'2'
'3'

Foreign Keys
Activities_CASES.CASE_IDActivities.CASE_ID
Output
Result
Column1 : STRINGColumn2 : INT
'1'
1
'2'
0
'3'
0


  • No labels