Skip to main content

PQL46 (PQL Function Library - CPM 4.6)

MATCH_PROCESS_REGEX

Applies to: CELONIS 4.5 CELONIS 4.6

Description

Filters the variants based on a regular expression defined over the activities.

MATCH_PROCESS_REGEX matches the variants of a process based on a regular expression. The regular expression defines a pattern over the activities of the variant. It returns an INT value which is 1 if the variant matches the pattern or 0 if it does not match.

If the regular expression contains an activity name that does not exist, then a warning is displayed. (Since: CELONIS 4.6)

[1] Regular expression with non-existing activity and non-matching wildcard: Empty result and warnings.

Query

Filter

FILTER MATCH_PROCESS_REGEX ( "Table1"."ACTIVITY" , 'X' >> LIKE '%foo%' ) = 1

Column1

"Table1"."CASE_ID"

Column2

"Table1"."ACTIVITY"

Input

Output

Table1

CASE_ID : STRING

ACTIVITY : STRING

startTimestamp : DATE

'1'

'Activity A'

Fri Jan 01 2016 01:00:00.000

'1'

'Activity D'

Fri Jan 01 2016 02:00:00.000

'2'

'Activity A'

Fri Jan 01 2016 03:00:00.000

'2'

'Activity C'

Fri Jan 01 2016 04:00:00.000

'3'

'Activity A'

Fri Jan 01 2016 05:00:00.000

'3'

'Activity B'

Fri Jan 01 2016 06:00:00.000

Result

Column1 : STRING

Column2 : STRING

Warning

Regex tree preprocessing: Could not find id for lookup activity like: '%foo%'

Warning

Regex tree preprocessing: Could not find id for lookup activity: 'X'

Tips
  • You can define an alias for an activity and use this alias as an abbreviation for the activity name inside the regular expression. This improves the readability of the pattern if you have long activity names.

  • You can also define an alias for a subexpression. This enables you to give a logical name to a certain subexpression and use this subexpression at different places inside the regular expression. This allows you to structure your regular expressions and improve its understandability

  • MATCH_PROCESS_REGEX is very useful if you want to filter on variants by a very specific and/or complex pattern, e.g. containing sequences or loops. If you want to filter for the occurrence of a single activity or very simple patterns, it is far more efficient to use normal conditional expressions instead of regular expressions.

  • Instead of specifying the activity column, it is also possible to use another string column of the activity table. For example, you can write a process regex that matches cases based on the user type.

Syntax
MATCH_PROCESS_REGEX ( activity_table.string_column, regular_expression )
  • string_column: string column joinable to the activity table. Usually, this is the activity column of the activity table.

  • regular_expression: a regular expression to match the variants against. The patterns that can be used inside the regular expression are described below.

Patterns

This is an overview over all patterns that can be used within the regular expression. Full example queries as well as more detailed descriptions can be found below.

Syntax

Meaning

Example

' '

Case contains the activity

'A'

^

Case starts with the activity

^ 'Scan Invoice'

$

Case ends with the activity

'B' $

>>

Activities directly follow

'A' >> 'B'

|

Logical OR

'A' | 'B'

( )

Group of activities

('A' | 'B') >> ('C' >> 'D')

*

0 or more occurrences

('A' >> 'B')*

+

1 or more occurrences

('A' >> 'B')+

?

0 or 1 occurrences

('A' >> 'B')?

{<from>, <to>}

Between <from> and <to> occurrences

('A' >> 'B'){1, 3}

'*'

Any activity matches

'A' >> ('*')+ >> 'B'

ANY

Any activity matches

'A' >> (ANY)+ >> 'B'

.

Any activity matches

'A' >> . >> 'C'

LIKE '%...%'

Activities that contain string

'A' >> LIKE '% Invoice%'

[' ',' ']

Set of activities of which one needs to match

'A' >> ['B','D','E'] >> 'C'

[! ' ']

Set of activities of which none matches

'A' >> [! 'B'] >> 'C'

AS

Gives an alias to a regex

('A' >> (ANY)*) AS sequence, sequence >> 'B'

Example

[2] Matches any variant that starts with activity 'A' followed by activity 'B', followed by one or more activities 'A' or 'B' in arbitrary order. After that, an activity 'C' has to occur. However, before and after that activity any number (zero or more) of any possible activity may occur. Finally, the variant has to end with an activity 'H'.

Query

Filter

FILTER
    MATCH_PROCESS_REGEX (
        "Table1"."ACTIVITY" ,
        ^ 'A' >> 'B' >> ( 'B' | 'A' ) + >> ( ANY ) * >> 'C' >> ( '*' ) * >> 'H' $
    )
    =
    1

Column1

"Table1"."Case"

Column2

"Table1"."Activity"

Input

Output

Table1

Case : INT

Activity : STRING

Timestamp : DATE

1

'A'

Fri Jan 01 2016 01:00:00.000

1

'B'

Fri Jan 01 2016 02:00:00.000

1

'C'

Fri Jan 01 2016 03:00:00.000

1

'D'

Fri Jan 01 2016 04:00:00.000

1

'E'

Fri Jan 01 2016 05:00:00.000

1

'F'

Fri Jan 01 2016 06:00:00.000

1

'G'

Fri Jan 01 2016 07:00:00.000

2

'A'

Fri Jan 01 2016 08:00:00.000

2

'B'

Fri Jan 01 2016 09:00:00.000

2

'D'

Fri Jan 01 2016 10:00:00.000

2

'E'

Fri Jan 01 2016 11:00:00.000

2

'H'

Fri Jan 01 2016 12:00:00.000

2

'I'

Fri Jan 01 2016 13:00:00.000

3

'A'

Fri Jan 01 2016 14:00:00.000

3

'B'

Fri Jan 01 2016 15:00:00.000

3

'B'

Fri Jan 01 2016 16:00:00.000

3

'C'

Fri Jan 01 2016 17:00:00.000

3

'G'

Fri Jan 01 2016 18:00:00.000

3

'H'

Fri Jan 01 2016 19:00:00.000

4

'A'

Fri Jan 01 2016 20:00:00.000

4

'B'

Fri Jan 01 2016 21:00:00.000

4

'A'

Fri Jan 01 2016 22:00:00.000

4

'D'

Fri Jan 01 2016 23:00:00.000

4

'C'

Sat Jan 02 2016 00:00:00.000

4

'H'

Sat Jan 02 2016 01:00:00.000

5

'D'

Sat Jan 02 2016 02:00:00.000

5

'A'

Sat Jan 02 2016 03:00:00.000

5

'A'

Sat Jan 02 2016 04:00:00.000

5

'B'

Sat Jan 02 2016 05:00:00.000

5

'B'

Sat Jan 02 2016 06:00:00.000

5

'H'

Sat Jan 02 2016 07:00:00.000

6

'X'

Sat Jan 02 2016 08:00:00.000

6

'A'

Sat Jan 02 2016 09:00:00.000

6

'B'

Sat Jan 02 2016 10:00:00.000

6

'B'

Sat Jan 02 2016 11:00:00.000

6

'B'

Sat Jan 02 2016 12:00:00.000

6

'C'

Sat Jan 02 2016 13:00:00.000

6

'H'

Sat Jan 02 2016 14:00:00.000

7

'A'

Sat Jan 02 2016 15:00:00.000

7

'B'

Sat Jan 02 2016 16:00:00.000

7

'A'

Sat Jan 02 2016 17:00:00.000

7

'D'

Sat Jan 02 2016 18:00:00.000

7

'C'

Sat Jan 02 2016 19:00:00.000

7

'H'

Sat Jan 02 2016 20:00:00.000

7

'Y'

Sat Jan 02 2016 21:00:00.000

8

'X'

Sat Jan 02 2016 22:00:00.000

8

'Y'

Sat Jan 02 2016 23:00:00.000

8

'Z'

Sun Jan 03 2016 00:00:00.000

8

'A'

Sun Jan 03 2016 01:00:00.000

8

'B'

Sun Jan 03 2016 02:00:00.000

8

'C'

Sun Jan 03 2016 03:00:00.000

Result

Column1 : INT

Column2 : STRING

3

'A'

3

'B'

3

'B'

3

'C'

3

'G'

3

'H'

4

'A'

4

'B'

4

'A'

4

'D'

4

'C'

4

'H'

Regular expression constructs

The regular expression can be constructed from several special constructs. In the following sections, these construts will be explained in more detail.

Activities

Activities are given by their respective name (including all spaces, special characters etc.) enclosed by single quotes:

Syntax
"'" <string> "'" 
Example

[3] Filters the variants for the occurrence of a certain activity ('B'). Every variant having at least one instance of this activity will be contained in the result, regardless of the number of occurrences or its position in the variant.

Query

Filter

FILTER MATCH_PROCESS_REGEX ( "Table1"."ACTIVITY" , 'B' ) = 1

Column1

"Table1"."CASE_ID"

Column2

"Table1"."ACTIVITY"

Input

Output

Table1

CASE_ID : STRING

ACTIVITY : STRING

startTimestamp : DATE

'1'

'A'

Fri Jan 01 2016 01:00:00.000

'1'

'B'

Fri Jan 01 2016 02:00:00.000

'1'

'C'

Fri Jan 01 2016 03:00:00.000

'1'

'D'

Fri Jan 01 2016 04:00:00.000

'2'

'E'

Fri Jan 01 2016 05:00:00.000

'2'

'F'

Fri Jan 01 2016 06:00:00.000

'2'

'G'

Fri Jan 01 2016 07:00:00.000

'2'

'I'

Fri Jan 01 2016 08:00:00.000

Result

Column1 : STRING

Column2 : STRING

'1'

'A'

'1'

'B'

'1'

'C'

'1'

'D'

If multiple activities with similar names should be mapped at the same time, it is possible to use the keyword LIKE with a SQL-style wildcard match (read here for more details about the LIKE operator).

Syntax
LIKE "'" <identifier_with_wildcards> "'" 
Example

[4] Matches each variant that contains an activity having the term ask in its name.

Query

Filter

FILTER MATCH_PROCESS_REGEX ( "Table1"."ACTIVITY" , LIKE '%ask%' ) = 1

Column1

"Table1"."CASE_ID"

Column2

"Table1"."ACTIVITY"

Input

Output

Table1

CASE_ID : STRING

ACTIVITY : STRING

startTimestamp : DATE

'1'

'Activity A'

Fri Jan 01 2016 01:00:00.000

'1'

'Task B'

Fri Jan 01 2016 02:00:00.000

'1'

'Activity C'

Fri Jan 01 2016 03:00:00.000

'1'

'Activity D'

Fri Jan 01 2016 04:00:00.000

'2'

'Activity A'

Fri Jan 01 2016 05:00:00.000

'2'

'Activity A'

Fri Jan 01 2016 06:00:00.000

'2'

'Activity B'

Fri Jan 01 2016 07:00:00.000

'2'

'Activity C'

Fri Jan 01 2016 08:00:00.000

'3'

'Activity A'

Fri Jan 01 2016 09:00:00.000

'3'

'Activity B'

Fri Jan 01 2016 10:00:00.000

'3'

'Activity B'

Fri Jan 01 2016 11:00:00.000

'3'

'Activity C'

Fri Jan 01 2016 12:00:00.000

Result

Column1 : STRING

Column2 : STRING

'1'

'Activity A'

'1'

'Task B'

'1'

'Activity C'

'1'

'Activity D'

Concatenation

The concatenation allows to match for sequences of certain regular expressions (e.g. activity names). Sequences of two or more expressions are given in the order of occurrence, separated by two greater than symbols (>>).

Syntax
<regular_expression> >> <regular_expression> 
Example

[5] Matches the two consecutive activities 'B' and 'C'.

Query

Filter

FILTER MATCH_PROCESS_REGEX ( "Table1"."ACTIVITY" , 'B' >> 'C' ) = 1

Column1

"Table1"."CASE_ID"

Column2

"Table1"."ACTIVITY"

Input

Output

Table1

CASE_ID : STRING

ACTIVITY : STRING

startTimestamp : DATE

'1'

'A'

Fri Jan 01 2016 01:00:00.000

'1'

'B'

Fri Jan 01 2016 02:00:00.000

'1'

'C'

Fri Jan 01 2016 03:00:00.000

'1'

'D'

Fri Jan 01 2016 04:00:00.000

'2'

'E'

Fri Jan 01 2016 05:00:00.000

'2'

'F'

Fri Jan 01 2016 06:00:00.000

'2'

'G'

Fri Jan 01 2016 07:00:00.000

'2'

'I'

Fri Jan 01 2016 08:00:00.000

'3'

'B'

Fri Jan 01 2016 09:00:00.000

'3'

'A'

Fri Jan 01 2016 10:00:00.000

'3'

'C'

Fri Jan 01 2016 11:00:00.000

'3'

'D'

Fri Jan 01 2016 12:00:00.000

Result

Column1 : STRING

Column2 : STRING

'1'

'A'

'1'

'B'

'1'

'C'

'1'

'D'

Matching any activity

Furthermore, it is possible to match any activity, regardless of its name. For convenience, there are three different ways to do this:

  • by a an activity name that only contains the star symbol ('*'),

  • by the keyword ANY,

  • or by the dot symbol (.) which is well-known from the widely-used Perl regular expression syntax.

Example

[6] Matches a sequence of activities where the second activity may have an arbitrary name.

Query

Filter

FILTER MATCH_PROCESS_REGEX ( "Table1"."ACTIVITY" , 'A' >> . >> 'B' >> 'C' ) = 1

Column1

"Table1"."CASE_ID"

Column2

"Table1"."ACTIVITY"

Input

Output

Table1

CASE_ID : STRING

ACTIVITY : STRING

startTimestamp : DATE

'1'

'A'

Fri Jan 01 2016 01:00:00.000

'1'

'B'

Fri Jan 01 2016 02:00:00.000

'1'

'C'

Fri Jan 01 2016 03:00:00.000

'1'

'D'

Fri Jan 01 2016 04:00:00.000

'2'

'A'

Fri Jan 01 2016 05:00:00.000

'2'

'A'

Fri Jan 01 2016 06:00:00.000

'2'

'B'

Fri Jan 01 2016 07:00:00.000

'2'

'C'

Fri Jan 01 2016 08:00:00.000

'3'

'A'

Fri Jan 01 2016 09:00:00.000

'3'

'B'

Fri Jan 01 2016 10:00:00.000

'3'

'B'

Fri Jan 01 2016 11:00:00.000

'3'

'C'

Fri Jan 01 2016 12:00:00.000

Result

Column1 : STRING

Column2 :

STRING

'2'

'A'

'2'

'A'

'2'

'B'

'2'

'C'

'3'

'A'

'3'

'B'

'3'

'B'

'3'

'C'

Choice

The choice construct allows to define two or more different alternatives for the matching regular expression. The overall regular expression matches if one of the given alternatives is matched. The alternative regular expressions are separated by the pipe symbol (|). In contrast to the concatenation, the order of the alternatives is irrelevant.

Syntax
<regular_expression> "|" <regular_expression>
Example

[7] Matches every variant that contains an activity with name 'A', 'B', or 'H'.

Query

Filter

FILTER MATCH_PROCESS_REGEX ( "Table1"."ACTIVITY" , 'A' | 'B' | 'H' ) = 1

Column1

"Table1"."CASE_ID"

Column2

"Table1"."ACTIVITY"

Input

Output

Table1

CASE_ID : STRING

ACTIVITY : STRING

startTimestamp : DATE

'1'

'A'

Fri Jan 01 2016 01:00:00.000

'1'

'B'

Fri Jan 01 2016 02:00:00.000

'1'

'C'

Fri Jan 01 2016 03:00:00.000

'2'

'E'

Fri Jan 01 2016 04:00:00.000

'2'

'F'

Fri Jan 01 2016 05:00:00.000

'2'

'G'

Fri Jan 01 2016 06:00:00.000

'3'

'H'

Fri Jan 01 2016 07:00:00.000

'3'

'I'

Fri Jan 01 2016 08:00:00.000

'3'

'K'

Fri Jan 01 2016 09:00:00.000

Result

Column1 : STRING

Column2 : STRING

'1'

'A'

'1'

'B'

'1'

'C'

'3'

'H'

'3'

'I'

'3'

'K'

Match start/end

In general, the regular expressions may match at any position in a variant. However, sometimes it is desirable to define a pattern each matching variants should start and / or end with. To define an exact match at the beginning of each variant, the ^ symbol must be added before the definition of the regular expression. To define an exact match at the end of each variant, the $ symbol must be added after the definnition of the regular expression.

Syntax
[ ^ ] <regular_expression> [ $ ]
Examples

[8] Matches every variant that starts with an activity 'A' followed by an activity 'B'.

Query

Filter

FILTER MATCH_PROCESS_REGEX ( "Table1"."ACTIVITY" , ^ 'A' >> 'B' ) = 1

Column1

"Table1"."CASE_ID"

Column2

"Table1"."ACTIVITY"

Input

Output

Table1

CASE_ID : STRING

ACTIVITY : STRING

startTimestamp : DATE

'1'

'A'

Fri Jan 01 2016 01:00:00.000

'1'

'B'

Fri Jan 01 2016 02:00:00.000

'1'

'C'

Fri Jan 01 2016 03:00:00.000

'1'

'D'

Fri Jan 01 2016 04:00:00.000

'2'

'A'

Fri Jan 01 2016 05:00:00.000

'2'

'A'

Fri Jan 01 2016 06:00:00.000

'2'

'B'

Fri Jan 01 2016 07:00:00.000

'2'

'C'

Fri Jan 01 2016 08:00:00.000

Result

Column1 : STRING

Column2 : STRING

'1'

'A'

'1'

'B'

'1'

'C'

'1'

'D'

[9] Matches every variant that ends with an activity 'C' followed by an activity 'D'.

Query

Filter

FILTER MATCH_PROCESS_REGEX ( "Table1"."ACTIVITY" , 'C' >> 'D' $ ) = 1

Column1

"Table1"."CASE_ID"

Column2

"Table1"."ACTIVITY"

Input

Output

Table1

CASE_ID : STRING

ACTIVITY : STRING

startTimestamp : DATE

'1'

'A'

Fri Jan 01 2016 01:00:00.000

'1'

'B'

Fri Jan 01 2016 02:00:00.000

'1'

'C'

Fri Jan 01 2016 03:00:00.000

'1'

'D'

Fri Jan 01 2016 04:00:00.000

'2'

'A'

Fri Jan 01 2016 05:00:00.000

'2'

'A'

Fri Jan 01 2016 06:00:00.000

'2'

'B'

Fri Jan 01 2016 07:00:00.000

'2'

'C'

Fri Jan 01 2016 08:00:00.000

'3'

'A'

Fri Jan 01 2016 09:00:00.000

'3'

'C'

Fri Jan 01 2016 10:00:00.000

'3'

'D'

Fri Jan 01 2016 11:00:00.000

'3'

'A'

Fri Jan 01 2016 12:00:00.000

'3'

'C'

Fri Jan 01 2016 13:00:00.000

'3'

'D'

Fri Jan 01 2016 14:00:00.000

Result

Column1 : STRING

Column2 : STRING

'1'

'A'

'1'

'B'

'1'

'C'

'1'

'D'

'3'

'A'

'3'

'C'

'3'

'D'

'3'

'A'

'3'

'C'

'3'

'D'

Activity sets

Activity sets are an alternative way to express a choice between different activities. An activity set matches any activity that is given in its defintion. Activity sets are defined as a (optionally comma-separated) list of activity names enclosed by square brackets. In contrast to choices, activity sets can only be defined for activities, but not for arbitrary regular expressions. However, activity sets can be inverted by adding a ! between the opening bracket and the first activity name, so the activity set matches the complement of the defined activities. This way, it is easy to express that all activities except one ore more certain activities should match.

Syntax
"[" [ ! ] <activity_name> ( , <activity_name> )* "]"
Examples

[10] Matches each variant that contains an activity 'A', 'B', or 'F'.

Query

Filter

FILTER MATCH_PROCESS_REGEX ( "Table1"."ACTIVITY" , [ 'A' , 'B' , 'F' ] ) = 1

Column1

"Table1"."CASE_ID"

Column2

"Table1"."ACTIVITY"

Input

Output

Table1

CASE_ID : STRING

ACTIVITY : STRING

startTimestamp : DATE

'1'

'A'

Fri Jan 01 2016 01:00:00.000

'1'

'B'

Fri Jan 01 2016 02:00:00.000

'1'

'C'

Fri Jan 01 2016 03:00:00.000

'2'

'E'

Fri Jan 01 2016 04:00:00.000

'2'

'F'

Fri Jan 01 2016 05:00:00.000

'2'

'G'

Fri Jan 01 2016 06:00:00.000

'3'

'X'

Fri Jan 01 2016 07:00:00.000

'3'

'Y'

Fri Jan 01 2016 08:00:00.000

'3'

'Z'

Fri Jan 01 2016 09:00:00.000

Result

Column1 : STRING

Column2 : STRING

'1'

'A'

'1'

'B'

'1'

'C'

'2'

'E'

'2'

'F'

'2'

'G'

[11] Matches each variant that contains an activity which is not 'A', 'B', or 'X'.

Query

Filter

FILTER MATCH_PROCESS_REGEX ("Table1"."ACTIVITY", [ ! 'A', 'B', 'X'] ) = 1

Column1

"Table1"."CASE_ID"

Column2

"Table1"."ACTIVITY"

Input

Output

Table1

CASE_ID : STRING

ACTIVITY : STRING

startTimestamp : DATE

'1'

'A'

Fri Jan 01 2016 01:00:00.000

'1'

'B'

Fri Jan 01 2016 02:00:00.000

'1'

'C'

Fri Jan 01 2016 03:00:00.000

'2'

'A'

Fri Jan 01 2016 04:00:00.000

'2'

'B'

Fri Jan 01 2016 05:00:00.000

'2'

'A'

Fri Jan 01 2016 06:00:00.000

'3'

'X'

Fri Jan 01 2016 07:00:00.000

'3'

'Y'

Fri Jan 01 2016 08:00:00.000

'3'

'Z'

Fri Jan 01 2016 09:00:00.000

Result

Column1 : STRING

Column2 : STRING

'1'

'A'

'1'

'B'

'1'

'C'

'3'

'X'

'3'

'Y'

'3'

'Z'

Grouping

Regular expressions which contain multiple sub-expressions (e.g., choices or concatenations) can be bracketed to group them in order to to clearify the structure. Grouping may be also necessary to nest regular expressions.

Syntax
"(" <regular_expression> ")"
Example

[12] Matches a sequence of Activities 'A', 'B', and 'C'. The regular expression is grouped by a pair of brackets (without quantifier).

Query

Filter

FILTER MATCH_PROCESS_REGEX ( "Table1"."ACTIVITY" , ( 'A' >> 'B' >> 'C' ) ) = 1

Column1

"Table1"."CASE_ID"

Column2

"Table1"."ACTIVITY"

Input

Output

Table1

CASE_ID : STRING

ACTIVITY : STRING

startTimestamp : DATE

'1'

'A'

Fri Jan 01 2016 01:00:00.000

'1'

'B'

Fri Jan 01 2016 02:00:00.000

'1'

'C'

Fri Jan 01 2016 03:00:00.000

'1'

'D'

Fri Jan 01 2016 04:00:00.000

'2'

'A'

Fri Jan 01 2016 05:00:00.000

'2'

'A'

Fri Jan 01 2016 06:00:00.000

'2'

'B'

Fri Jan 01 2016 07:00:00.000

'2'

'C'

Fri Jan 01 2016 08:00:00.000

'3'

'A'

Fri Jan 01 2016 09:00:00.000

'3'

'B'

Fri Jan 01 2016 10:00:00.000

'3'

'B'

Fri Jan 01 2016 11:00:00.000

'3'

'C'

Fri Jan 01 2016 12:00:00.000

Result

Column1 : STRING

Column2 : STRING

'1'

'A'

'1'

'B'

'1'

'C'

'1'

'D'

'2'

'A'

'2'

'A'

'2'

'B'

'2'

'C'

Quantifiers

A quantifier defines that a regular expression may occur a variable number of times. Quantifiers must always be applied to a group expression. There are three different quantifiers available:

  • * The regular expression may occur an arbitrary number of times, i.e. it occurs zero or more times.

  • + The regular expression occurs at least once, i.e. it occurs one ore more times.

  • ? The regular expression is optional, i.e. it may occur exactly once or not.

  • {<from>, <to>} The regular expression may occur between <from> and <to> times (ie. <from>, <from>+1, <from>+2, ..., <to>-1, <to>). Note that both <from> and <to> are included in the range. For example, ('A'){1, 3} will match all variants that contain the activity 'A', one or two or three times. You can use this quantifier to make your regular expressions more readable:

    • 'A' >> 'A' >> ('A')? >> ('A')? becomes ('A'){2, 4},

    • ('A' >> 'A' >> 'A') | ('A' >> 'A' >> 'A' >> 'A') becomes ('A'){3,4},

    • ('A')? | ('A' >> 'A') | ('A' >> 'A' >> 'A' >> 'A') | ('A' >> 'A' >> 'A' >> 'A' >> 'A') becomes ('A'){0,2} | ('A'){4,5}.

    If <from> and <to> are equal then it's enough to specify <from>, for example ('A'){3, 3} can be written as ('A'){3}.

    The arguments <from> and <to> must satisfy all of the following conditions:

    • both <from> and <to> must be integers that are greater or equal zero,

    • <from> and <to> can't both be zero,

    • <from> must be less or equal <to>.

Syntax
"(" <regular_expression> ")" ( "+" | "?" | "*" | "{" <from>  ["," <to>]  "}")
Examples

[13] Matches each variant that has an arbitrary number of activities 'B' between activities 'A' and 'C'.

Filter

FILTER MATCH_PROCESS_REGEX ( "Table1"."ACTIVITY" , 'A' >> ( 'B' ) * >> 'C' ) = 1

Column1

"Table1"."Case"

Column2

"Table1"."Activity"

Input

Output

Table1

Case : INT

Activity : STRING

Timestamp : DATE

1

'A'

Fri Jan 01 2016 01:00:00.000

1

'B'

Fri Jan 01 2016 02:00:00.000

1

'C'

Fri Jan 01 2016 03:00:00.000

2

'A'

Fri Jan 01 2016 04:00:00.000

2

'B'

Fri Jan 01 2016 05:00:00.000

2

'B'

Fri Jan 01 2016 06:00:00.000

2

'C'

Fri Jan 01 2016 07:00:00.000

3

'A'

Fri Jan 01 2016 08:00:00.000

3

'C'

Fri Jan 01 2016 09:00:00.000

Result

Column1 : INT

Column2 : STRING

1

'A'

1

'B'

1

'C'

2

'A'

2

'B'

2

'B'

2

'C'

3

'A'

3

'C'

[14] Matches each variant that has at least one activity 'B' between activities 'A' and 'C'.

Query

Filter

FILTER MATCH_PROCESS_REGEX ( "Table1"."ACTIVITY" , 'A' >> ( 'B' ) + >> 'C' ) = 1

Column1

"Table1"."Case"

Column2

"Table1"."Activity"

Input

Output

Table1

Case : INT

Activity : STRING

Timestamp : DATE

1

'A'

Fri Jan 01 2016 01:00:00.000

1

'B'

Fri Jan 01 2016 02:00:00.000

1

'C'

Fri Jan 01 2016 03:00:00.000

2

'A'

Fri Jan 01 2016 04:00:00.000

2

'B'

Fri Jan 01 2016 05:00:00.000

2

'B'

Fri Jan 01 2016 06:00:00.000

2

'C'

Fri Jan 01 2016 07:00:00.000

3

'A'

Fri Jan 01 2016 08:00:00.000

3

'C'

Fri Jan 01 2016 09:00:00.000

Result

Column1 : INT

Column2 : STRING

1

'A'

1

'B'

1

'C'

2

'A'

2

'B'

2

'B'

2

'C'

[15] Matches each variant that has no or one activity 'B' between activities 'A' and 'C'.

Query

Filter

FILTER MATCH_PROCESS_REGEX ( "Table1"."ACTIVITY" , 'A' >> ( 'B' ) ? >> 'C' ) = 1

Column1

"Table1"."Case"

Column2

"Table1"."Activity"

Input

Output

Table1

Case : INT

Activity : STRING

Timestamp : DATE

1

'A'

Fri Jan 01 2016 01:00:00.000

1

'B'

Fri Jan 01 2016 02:00:00.000

1

'C'

Fri Jan 01 2016 03:00:00.000

2

'A'

Fri Jan 01 2016 04:00:00.000

2

'B'

Fri Jan 01 2016 05:00:00.000

2

'B'

Fri Jan 01 2016 06:00:00.000

2

'C'

Fri Jan 01 2016 07:00:00.000

3

'A'

Fri Jan 01 2016 08:00:00.000

3

'C'

Fri Jan 01 2016 09:00:00.000

Result

Column1 : INT

Column2 : STRING

1

'A'

1

'B'

1

'C'

3

'A'

3

'C'

[16] Matches each variant that has 1 or 2 consecutive 'B' activities between activities 'A' and 'C'.

Query

Filter

FILTER MATCH_PROCESS_REGEX ( "Table1"."ACTIVITY" , 'A' >> ( 'B' ) { 1 , 2 } >> 'C' ) = 1

Column1

"Table1"."CASE_ID"

Column2

"Table1"."ACTIVITY"

Input

Output

Table1

CASE_ID : STRING

ACTIVITY : STRING

startTimestamp : DATE

'1'

'A'

Fri Jan 01 2016 01:00:00.000

'1'

'B'

Fri Jan 01 2016 02:00:00.000

'1'

'C'

Fri Jan 01 2016 03:00:00.000

'2'

'A'

Fri Jan 01 2016 04:00:00.000

'2'

'B'

Fri Jan 01 2016 05:00:00.000

'2'

'B'

Fri Jan 01 2016 06:00:00.000

'2'

'C'

Fri Jan 01 2016 07:00:00.000

'3'

'A'

Fri Jan 01 2016 08:00:00.000

'3'

'C'

Fri Jan 01 2016 09:00:00.000

Result

Column1 : STRING

Column2 :STRING

'1'

'A'

'1'

'B'

'1'

'C'

'2'

'A'

'2'

'B'

'2'

'B'

'2'

'C'

Pitfall
  • If not specified to match exactly at the start and/or at the end of the variant, a regular expression may match at any position in the variant as an arbitrary number of activities at beginning / end may be skipped. Consequently, a ? quantifier for example may also match any number of occurrence of its subexpression, if it is the only subexpression or the first/last subexpression of a concatenation.

Example

[17] Matches each variant that has an activity 'B'. Because there is no exact match at start or end of the variant defined, all given variants are matched as the additional activity 'B' for variant 2 is implicitly skipped.

Query

Filter

FILTER MATCH_PROCESS_REGEX ( "Table1"."ACTIVITY" , ( 'B' ) ? ) = 1

Column1

"Table1"."Case"

Column2

"Table1"."Activity"

Input

Output

Table1

Case : INT

Activity : STRING

Timestamp : DATE

1

'A'

Fri Jan 01 2016 01:00:00.000

1

'B'

Fri Jan 01 2016 02:00:00.000

1

'C'

Fri Jan 01 2016 03:00:00.000

2

'A'

Fri Jan 01 2016 04:00:00.000

2

'B'

Fri Jan 01 2016 05:00:00.000

2

'B'

Fri Jan 01 2016 06:00:00.000

2

'C'

Fri Jan 01 2016 07:00:00.000

3

'A'

Fri Jan 01 2016 08:00:00.000

3

'C'

Fri Jan 01 2016 09:00:00.000

Result

Column1 : INT

Column2 : STRING

1

'A'

1

'B'

1

'C'

2

'A'

2

'B'

2

'B'

2

'C'

3

'A'

3

'C'

Defining and referencing an alias

Sometimes, similar subexpressions may occur at different positions in the regular expression. This may be simple activities as well as complex subexpressions. An alias provides an elegant way to avoid repeating these subexpressions at different positions in the regular expression. It is a simple name that is assigned to a specific regular expression. By this name, the assigned regular expression can be referneced in order to re-use it in another regular expresssion. The regular expressions and their related aliases are given as a comma-separated list. The last regular expression of this list is the root expression which is the entry point for the pattern matching. Therefore, it cannot be referenced by another regular expression. Consequently, the root expression is never assigned to an alias.

Tips/Pitfalls
  • Avoid to create cyclic definitions of regular expressions when using aliases as cycles will result in an error.

  • Aliases can be used as a shortcut for long activity names. This may save a lot of typing and increase the readability of complex regular expressions.

  • Be aware that the match at start symbol must be placed before the first aliased expressions, the match at end symbol must be placed after the root expression.

  • It is not possible to reference aliases defined in another PROCESS_MATCH_REGEX operator. Only aliases within the same operator can be referenced.

Syntax
<regular_expression> ( AS <alias_name> ","  <regular_expression> )* 
Example

[18] Matches every variant that contains a sequence of 'A', 'B', 'C', and 'D'. Activities 'A' and 'B' are renamed to 'AliasA' and 'AliasB', respectively.

Query

Filter

FILTER
    MATCH_PROCESS_REGEX (
        "Table1"."ACTIVITY" ,
        'B' AS AliasB , 'A' AS AliasA , AliasA >> AliasB >> 'C' >> 'D'
    )
    =
    1

Column1

"Table1"."CASE_ID"

Column2

"Table1"."ACTIVITY"

Input

Output

Table1

CASE_ID : STRING

ACTIVITY : STRING

startTimestamp : DATE

'1'

'A'

Fri Jan 01 2016 01:00:00.000

'1'

'B'

Fri Jan 01 2016 02:00:00.000

'1'

'C'

Fri Jan 01 2016 03:00:00.000

'1'

'D'

Fri Jan 01 2016 04:00:00.000

'2'

'E'

Fri Jan 01 2016 05:00:00.000

'2'

'F'

Fri Jan 01 2016 06:00:00.000

'2'

'G'

Fri Jan 01 2016 07:00:00.000

'2'

'I'

Fri Jan 01 2016 08:00:00.000

Result

Column1 : STRING

Column2 : STRING

'1'

'A'

'1'

'B'

'1'

'C'

'1'

'D'

See also: