MATCH_PROCESS_REGEX
Applies to: CELONIS 4.5 CELONIS 4.6
Description
Filters the variants based on a regular expression defined over the activities.
MATCH_PROCESS_REGEX matches the variants of a process based on a regular expression. The regular expression defines a pattern over the activities of the variant. It returns an INT value which is 1 if the variant matches the pattern or 0 if it does not match.
If the regular expression contains an activity name that does not exist, then a warning is displayed. (Since: CELONIS 4.6)
[1] Regular expression with non-existing activity and non-matching wildcard: Empty result and warnings.
|
Tips
You can define an alias for an activity and use this alias as an abbreviation for the activity name inside the regular expression. This improves the readability of the pattern if you have long activity names.
You can also define an alias for a subexpression. This enables you to give a logical name to a certain subexpression and use this subexpression at different places inside the regular expression. This allows you to structure your regular expressions and improve its understandability
MATCH_PROCESS_REGEX is very useful if you want to filter on variants by a very specific and/or complex pattern, e.g. containing sequences or loops. If you want to filter for the occurrence of a single activity or very simple patterns, it is far more efficient to use normal conditional expressions instead of regular expressions.
Instead of specifying the activity column, it is also possible to use another string column of the activity table. For example, you can write a process regex that matches cases based on the user type.
Syntax
MATCH_PROCESS_REGEX ( activity_table.string_column, regular_expression )
string_column: string column joinable to the activity table. Usually, this is the activity column of the activity table.
regular_expression: a regular expression to match the variants against. The patterns that can be used inside the regular expression are described below.
Patterns
This is an overview over all patterns that can be used within the regular expression. Full example queries as well as more detailed descriptions can be found below.
Syntax | Meaning | Example |
---|---|---|
' ' | Case contains the activity | 'A' |
^ | Case starts with the activity | ^ 'Scan Invoice' |
$ | Case ends with the activity | 'B' $ |
>> | Activities directly follow | 'A' >> 'B' |
| | Logical OR | 'A' | 'B' |
( ) | Group of activities | ('A' | 'B') >> ('C' >> 'D') |
* | 0 or more occurrences | ('A' >> 'B')* |
+ | 1 or more occurrences | ('A' >> 'B')+ |
? | 0 or 1 occurrences | ('A' >> 'B')? |
{<from>, <to>} | Between <from> and <to> occurrences | ('A' >> 'B'){1, 3} |
'*' | Any activity matches | 'A' >> ('*')+ >> 'B' |
ANY | Any activity matches | 'A' >> (ANY)+ >> 'B' |
. | Any activity matches | 'A' >> . >> 'C' |
LIKE '%...%' | Activities that contain string | 'A' >> LIKE '% Invoice%' |
[' ',' '] | Set of activities of which one needs to match | 'A' >> ['B','D','E'] >> 'C' |
[! ' '] | Set of activities of which none matches | 'A' >> [! 'B'] >> 'C' |
AS | Gives an alias to a regex | ('A' >> (ANY)*) AS sequence, sequence >> 'B' |
Example
[2] Matches any variant that starts with activity
|
Regular expression constructs
The regular expression can be constructed from several special constructs. In the following sections, these construts will be explained in more detail.
Activities
Activities are given by their respective name (including all spaces, special characters etc.) enclosed by single quotes:
Syntax
"'" <string> "'"
Example
[3] Filters the variants for the occurrence of a certain activity (
|
If multiple activities with similar names should be mapped at the same time, it is possible to use the keyword LIKE
with a SQL-style wildcard match (read here for more details about the LIKE
operator).
Syntax
LIKE "'" <identifier_with_wildcards> "'"
Example
[4] Matches each variant that contains an activity having the term
|
Concatenation
The concatenation allows to match for sequences of certain regular expressions (e.g. activity names). Sequences of two or more expressions are given in the order of occurrence, separated by two greater than symbols (>>
).
Syntax
<regular_expression> >> <regular_expression>
Example
[5] Matches the two consecutive activities 'B' and 'C'.
|
Matching any activity
Furthermore, it is possible to match any activity, regardless of its name. For convenience, there are three different ways to do this:
by a an activity name that only contains the star symbol (
'*'
),by the keyword
ANY
,or by the dot symbol (
.
) which is well-known from the widely-used Perl regular expression syntax.
Example
[6] Matches a sequence of activities where the second activity may have an arbitrary name.
|
Choice
The choice construct allows to define two or more different alternatives for the matching regular expression. The overall regular expression matches if one of the given alternatives is matched. The alternative regular expressions are separated by the pipe symbol (|
). In contrast to the concatenation, the order of the alternatives is irrelevant.
Syntax
<regular_expression> "|" <regular_expression>
Example
[7] Matches every variant that contains an activity with name
|
Match start/end
In general, the regular expressions may match at any position in a variant. However, sometimes it is desirable to define a pattern each matching variants should start and / or end with. To define an exact match at the beginning of each variant, the ^
symbol must be added before the definition of the regular expression. To define an exact match at the end of each variant, the $
symbol must be added after the definnition of the regular expression.
Syntax
[ ^ ] <regular_expression> [ $ ]
Examples
[8] Matches every variant that starts with an activity
|
[9] Matches every variant that ends with an activity
|
Activity sets
Activity sets are an alternative way to express a choice between different activities. An activity set matches any activity that is given in its defintion. Activity sets are defined as a (optionally comma-separated) list of activity names enclosed by square brackets. In contrast to choices, activity sets can only be defined for activities, but not for arbitrary regular expressions. However, activity sets can be inverted by adding a !
between the opening bracket and the first activity name, so the activity set matches the complement of the defined activities. This way, it is easy to express that all activities except one ore more certain activities should match.
Syntax
"[" [ ! ] <activity_name> ( , <activity_name> )* "]"
Examples
[10] Matches each variant that contains an activity
|
[11] Matches each variant that contains an activity which is not
|
Grouping
Regular expressions which contain multiple sub-expressions (e.g., choices or concatenations) can be bracketed to group them in order to to clearify the structure. Grouping may be also necessary to nest regular expressions.
Syntax
"(" <regular_expression> ")"
Example
[12] Matches a sequence of Activities
|
Quantifiers
A quantifier defines that a regular expression may occur a variable number of times. Quantifiers must always be applied to a group expression. There are three different quantifiers available:
* The regular expression may occur an arbitrary number of times, i.e. it occurs zero or more times.
+ The regular expression occurs at least once, i.e. it occurs one ore more times.
? The regular expression is optional, i.e. it may occur exactly once or not.
{<from>, <to>} The regular expression may occur between <from> and <to> times (ie. <from>, <from>+1, <from>+2, ..., <to>-1, <to>). Note that both <from> and <to> are included in the range. For example, ('A'){1, 3} will match all variants that contain the activity 'A', one or two or three times. You can use this quantifier to make your regular expressions more readable:
'A' >> 'A' >> ('A')? >> ('A')? becomes ('A'){2, 4},
('A' >> 'A' >> 'A') | ('A' >> 'A' >> 'A' >> 'A') becomes ('A'){3,4},
('A')? | ('A' >> 'A') | ('A' >> 'A' >> 'A' >> 'A') | ('A' >> 'A' >> 'A' >> 'A' >> 'A') becomes ('A'){0,2} | ('A'){4,5}.
If <from> and <to> are equal then it's enough to specify <from>, for example ('A'){3, 3} can be written as ('A'){3}.
The arguments <from> and <to> must satisfy all of the following conditions:
both <from> and <to> must be integers that are greater or equal zero,
<from> and <to> can't both be zero,
<from> must be less or equal <to>.
Syntax
"(" <regular_expression> ")" ( "+" | "?" | "*" | "{" <from> ["," <to>] "}")
Examples
[13] Matches each variant that has an arbitrary number of activities Filter FILTER MATCH_PROCESS_REGEX ( "Table1"."ACTIVITY" , 'A' >> ( 'B' ) * >> 'C' ) = 1 Column1 "Table1"."Case" Column2 "Table1"."Activity"
|
[14] Matches each variant that has at least one activity
|
[15] Matches each variant that has no or one activity
|
[16] Matches each variant that has 1 or 2 consecutive 'B' activities between activities 'A' and 'C'.
|
Pitfall
If not specified to match exactly at the start and/or at the end of the variant, a regular expression may match at any position in the variant as an arbitrary number of activities at beginning / end may be skipped. Consequently, a ? quantifier for example may also match any number of occurrence of its subexpression, if it is the only subexpression or the first/last subexpression of a concatenation.
Example
[17] Matches each variant that has an activity
|
Defining and referencing an alias
Sometimes, similar subexpressions may occur at different positions in the regular expression. This may be simple activities as well as complex subexpressions. An alias provides an elegant way to avoid repeating these subexpressions at different positions in the regular expression. It is a simple name that is assigned to a specific regular expression. By this name, the assigned regular expression can be referneced in order to re-use it in another regular expresssion. The regular expressions and their related aliases are given as a comma-separated list. The last regular expression of this list is the root expression which is the entry point for the pattern matching. Therefore, it cannot be referenced by another regular expression. Consequently, the root expression is never assigned to an alias.
Tips/Pitfalls
Avoid to create cyclic definitions of regular expressions when using aliases as cycles will result in an error.
Aliases can be used as a shortcut for long activity names. This may save a lot of typing and increase the readability of complex regular expressions.
Be aware that the match at start symbol must be placed before the first aliased expressions, the match at end symbol must be placed after the root expression.
It is not possible to reference aliases defined in another PROCESS_MATCH_REGEX operator. Only aliases within the same operator can be referenced.
Syntax
<regular_expression> ( AS <alias_name> "," <regular_expression> )*
Example
[18] Matches every variant that contains a sequence of
|